Commit Graph

836980 Commits

Author SHA1 Message Date
Yafang Shao
b6cfab7ad1 mm, compaction: some tracepoints should be defined only when CONFIG_COMPACTION is set
Only mm_compaction_isolate_{free, migrate}pages may be used when
CONFIG_COMPACTION is not set.  All others are used only when
CONFIG_COMPACTION is set.

After this change, if CONFIG_COMPACTION is not set, the tracepoints that
only work when CONFIG_COMPACTION is set will not be exposed to userspace.
Without this change, they will always be exposed in debugfs whether
CONFIG_COMPACTION is set or not.  This is an improvement.

Link: http://lkml.kernel.org/r/1552440403-11780-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Yafang Shao
b1746b991d mm: compaction: show gfp flag names in try_to_compact_pages tracepoint
Showing the gfp flag names instead of the gfp_mask makes trace more
convenient.

Link: http://lkml.kernel.org/r/1552527998-13162-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Yue Hu
2b59e01a3a mm/cma.c: fix the bitmap status to show failed allocation reason
Currently one bit in cma bitmap represents number of pages rather than
one page, cma->count means cma size in pages. So to find available pages
via find_next_zero_bit()/find_next_bit() we should use cma size not in
pages but in bits although current free pages number is correct due to
zero value of order_per_bit. Once order_per_bit is changed the bitmap
status will be incorrect.

The size input in cma_debug_show_areas() is not correct.  It will
affect the available pages at some position to debug the failure issue.

This is an example with order_per_bit = 1

Before this change:
[    4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages

After this change:
[    4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages

Obviously the bitmap status before is incorrect.

Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Qian Cai
dd7ef7bd14 mm/compaction.c: fix an undefined behaviour
In a low-memory situation, cc->fast_search_fail can keep increasing as it
is unable to find an available page to isolate in
fast_isolate_freepages().  As the result, it could trigger an error below,
so just compare with the maximum bits can be shifted first.

UBSAN: Undefined behaviour in mm/compaction.c:1160:30
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G
W    L    5.0.0+ #17
Call trace:
 dump_backtrace+0x0/0x450
 show_stack+0x20/0x2c
 dump_stack+0xc8/0x14c
 __ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4
 compaction_alloc+0x2344/0x2484
 unmap_and_move+0xdc/0x1dbc
 migrate_pages+0x274/0x1310
 compact_zone+0x26ec/0x43bc
 kcompactd+0x15b8/0x1a24
 kthread+0x374/0x390
 ret_from_fork+0x10/0x18

[akpm@linux-foundation.org: code cleanup]
Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw
Fixes: 70b44595ea ("mm, compaction: use free lists to quickly locate a migration source")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Baoquan He
d3ba3ae197 mm/memory_hotplug.c: fix the wrong usage of N_HIGH_MEMORY
In node_states_check_changes_online(), N_HIGH_MEMORY is used to substitute
ZONE_HIGHMEM directly.  This is not right.  N_HIGH_MEMORY is to mark the
memory state of node.  Here zone index is checked, which should be
compared with 'ZONE_HIGHMEM' accordingly.

Replace it with ZONE_HIGHMEM.

This is a code cleanup - no known runtime effects.

Link: http://lkml.kernel.org/r/20190320080732.14933-1-bhe@redhat.com
Fixes: 8efe33f40f ("mm/memory_hotplug.c: simplify node_states_check_changes_online")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Oscar Salvador
39186cbe65 mm,memory_hotplug: drop redundant hugepage_migration_supported check
has_unmovable_pages() already checks whether the hugetlb page supports
migration, so all non-migratable hugetlb pages should have been caught
there.  Let us drop the check from scan_movable_pages() as is redundant.

Link: http://lkml.kernel.org/r/20190320152658.10855-3-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Oscar Salvador
10eeadf304 mm,memory_hotplug: unlock 1GB-hugetlb on x86_64
On x86_64, 1GB-hugetlb pages could never be offlined due to the fact
that hugepage_migration_supported() returned false for PUD_SHIFT.
So whenever we wanted to offline a memblock containing a gigantic
hugetlb page, we never got beyond has_unmovable_pages() check.
This changed with [1], where now we also return true for PUD_SHIFT.

After that patch, the check in has_unmovable_pages() and scan_movable_pages()
returned true, but we still had a final barrier in do_migrate_range():

if (compound_order(head) > PFN_SECTION_SHIFT) {
	ret = -EBUSY;
	break;
}

This is not really nice, and we do not really need it.
It is perfectly possible to migrate a gigantic page as long as another node has
a spare gigantic page for us.
In alloc_huge_page_nodemask(), we calculate the __real__ number of free pages,
and if any, we try to dequeue one from another node.

This all works fine when we do have another node with a spare gigantic page,
but if that is not the case, alloc_huge_page_nodemask() ends up calling
alloc_migrate_huge_page() which bails out if the wanted page is gigantic.
That is mainly because finding a 1GB (or even 16GB on powerpc) contiguous
memory is quite unlikely when the system has been running for a while.

In that situation, we will keep looping forever because scan_movable_pages()
will give us the same page and we will fail again because there is no node
where we can dequeue a gigantic page from.
This is not nice, and it has been raised that we might want to treat -ENOMEM
as a fatal error in do_migrate_range(), but this has to be checked further.

Anyway, I would tend say that this is the administrator's job, to make sure
that the system can keep up with the memory to be offlined, so that would mean
that if we want to use gigantic pages, make sure that the other nodes have at
least enough gigantic pages to keep up in case we need to offline memory.

Just for the sake of completeness, this is one of the tests done:

 # echo 1 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
 # echo 1 > /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages

 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
   1
 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
   1

 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/nr_hugepages
   1
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   1

 (hugetlb1gb is a program that maps 1GB region using MAP_HUGE_1GB)

 # numactl -m 1 ./hugetlb1gb
 # cat /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages
   0
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   1

 # offline node1 memory
 # cat /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages
   0

[1] https://lore.kernel.org/patchwork/patch/998796/

Link: http://lkml.kernel.org/r/20190320152658.10855-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
f3b4fdb18c IB/mthca: use the new FOLL_LONGTERM flag to get_user_pages_fast()
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.

Link: http://lkml.kernel.org/r/20190328084422.29911-8-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-8-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
664b21e717 IB/qib: use the new FOLL_LONGTERM flag to get_user_pages_fast()
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.

Link: http://lkml.kernel.org/r/20190328084422.29911-7-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-7-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
9fdf4aa156 IB/hfi1: use the new FOLL_LONGTERM flag to get_user_pages_fast()
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-6-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
7af75561e1 mm/gup: add FOLL_LONGTERM capability to GUP fast
DAX pages were previously unprotected from longterm pins when users called
get_user_pages_fast().

Use the new FOLL_LONGTERM flag to check for DEVMAP pages and fall back to
regular GUP processing if a DEVMAP page is encountered.

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-5-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-5-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-5-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
73b0140bf0 mm/gup: change GUP fast to use flags rather than a write 'bool'
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
b798bec474 mm/gup: change write parameter to flags in fast walk
In order to support more options in the GUP fast walk, change the write
parameter to flags throughout the call stack.

This patch does not change functionality and passes FOLL_WRITE where write
was previously used.

Link: http://lkml.kernel.org/r/20190328084422.29911-3-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-3-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Ira Weiny
932f4a630a mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM
Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move.  I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem.  Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory.  I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance.  I don't have the numbers as the
tests for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast().  Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality.  In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked.  However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages.  This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer.  This is really flagging a pin which is going to be given to
hardware and can't move.  I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem.  Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken
with some RDMA users who consider MR in the performance path...  For the
overall application performance.  I don't have the numbers as the tests
for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

</quote>

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Kirill Tkhai
a222f34158 mm: generalize putback scan functions
This combines two similar functions move_active_pages_to_lru() and
putback_inactive_pages() into single move_pages_to_lru().  This remove
duplicate code and makes object file size smaller.

Before:
   text	   data	    bss	    dec	    hex	filename
  57082	   4732	    128	  61942	   f1f6	mm/vmscan.o
After:
   text	   data	    bss	    dec	    hex	filename
  55112	   4600	    128	  59840	   e9c0	mm/vmscan.o

Note, that now we are checking for !page_evictable() coming from
shrink_active_list(), which shouldn't change any behavior since that path
works with evictable pages only.

Link: http://lkml.kernel.org/r/155290129627.31489.8321971028677203248.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Kirill Tkhai
f372d89e5d mm: remove pages_to_free argument of move_active_pages_to_lru()
We may use input argument list as output argument too.  This makes the
function more similar to putback_inactive_pages().

Link: http://lkml.kernel.org/r/155290129079.31489.16180612694090502942.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Kirill Tkhai
9851ac1359 mm: move nr_deactivate accounting to shrink_active_list()
We know which LRU is not active.

[chris@chrisdown.name: fix build on !CONFIG_MEMCG]
  Link: http://lkml.kernel.org/r/20190322150513.GA22021@chrisdown.name
Link: http://lkml.kernel.org/r/155290128498.31489.18250485448913338607.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Kirill Tkhai
886cf1901d mm: move recent_rotated pages calculation to shrink_inactive_list()
Patch series "mm: Generalize putback functions"]

putback_inactive_pages() and move_active_pages_to_lru() are almost
similar, so this patchset merges them ina single function.

This patch (of 4):

The patch moves the calculation from putback_inactive_pages() to
shrink_inactive_list().  This makes putback_inactive_pages() looking more
similar to move_active_pages_to_lru().

To do that, we account activated pages in reclaim_stat::nr_activate.
Since a page may change its LRU type from anon to file cache inside
shrink_page_list() (see ClearPageSwapBacked()), we have to account pages
for the both types.  So, nr_activate becomes an array.

Previously we used nr_activate to account PGACTIVATE events, but now we
account them into pgactivate variable (since they are about number of
pages in general, not about sum of hpage_nr_pages).

Link: http://lkml.kernel.org/r/155290127956.31489.3393586616054413298.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Vlastimil Babka
63931eb975 mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()
alloc_pages_exact*() allocates a page of sufficient order and then splits
it to return only the number of pages requested.  That makes it
incompatible with __GFP_COMP, because compound pages cannot be split.

As shown by [1] things may silently work until the requested size
(possibly depending on user) stops being power of two.  Then for
CONFIG_DEBUG_VM, BUG_ON() triggers in split_page().  Without
CONFIG_DEBUG_VM, consequences are unclear.

There are several options here, none of them great:

1) Don't do the splitting when __GFP_COMP is passed, and return the
   whole compound page.  However if caller then returns it via
   free_pages_exact(), that will be unexpected and the freeing actions
   there will be wrong.

2) Warn and remove __GFP_COMP from the flags.  But the caller may have
   really wanted it, so things may break later somewhere.

3) Warn and return NULL.  However NULL may be unexpected, especially
   for small sizes.

This patch picks option 2, because as Michal Hocko put it: "callers wanted
it" is much less probable than "caller is simply confused and more gfp
flags is surely better than fewer".

[1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

Link: http://lkml.kernel.org/r/0c6393eb-b28d-4607-c386-862a71f09de6@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Matthew Wilcox
5fd4ca2d84 mm: page cache: store only head pages in i_pages
Transparent Huge Pages are currently stored in i_pages as pointers to
consecutive subpages.  This patch changes that to storing consecutive
pointers to the head page in preparation for storing huge pages more
efficiently in i_pages.

Large parts of this are "inspired" by Kirill's patch
https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/

[willy@infradead.org: fix swapcache pages]
  Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
[kirill@shutemov.name: hugetlb stores pages in page cache differently]
  Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Kirill Shutemov <kirill@shutemov.name>
Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>
Tested-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Tested-by: Qian Cai <cai@lca.pw>
Cc: Hugh Dickins <hughd@google.com>
Cc: Song Liu <liu.song.a23@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Peter Xu
cefdca0a86 userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Yue Hu
f0fd50504a mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
If not find zero bit in find_next_zero_bit(), it will return the size
parameter passed in, so the start bit should be compared with bitmap_maxno
rather than cma->count.  Although getting maxchunk is working fine due to
zero value of order_per_bit currently, the operation will be stuck if
order_per_bit is set as non-zero.

Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Safonov <d.safonov@partner.samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Yafang Shao
3b775998ec include/trace/events/vmscan.h: drop zone id from kswapd tracepoints
It is not clear how the zone id is useful in kswapd tracepoints and the id
itself is not really easy to process because it depends on the
configuration (available zones).  Let's drop the id for now.  If somebody
really needs that information then the zone name should be used instead.

[mhocko@suse.com: new changelog]
Link: http://lkml.kernel.org/r/1552451813-10833-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Qian Cai
745e10146c mm/slab.c: fix an infinite loop in leaks_show()
"cat /proc/slab_allocators" could hang forever on SMP machines with
kmemleak or object debugging enabled due to other CPUs running do_drain()
will keep making kmemleak_object or debug_objects_cache dirty and unable
to escape the first loop in leaks_show(),

do {
	set_store_user_clean(cachep);
	drain_cpu_caches(cachep);
	...

} while (!is_store_user_clean(cachep));

For example,

do_drain
  slabs_destroy
    slab_destroy
      kmem_cache_free
        __cache_free
          ___cache_free
            kmemleak_free_recursive
              delete_object_full
                __delete_object
                  put_object
                    free_object_rcu
                      kmem_cache_free
                        cache_free_debugcheck --> dirty kmemleak_object

One approach is to check cachep->name and skip both kmemleak_object and
debug_objects_cache in leaks_show().  The other is to set store_user_clean
after drain_cpu_caches() which leaves a small window between
drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
be dirty again lead to slightly wrong information has been stored but
could also speed up things significantly which sounds like a good
compromise.  For example,

 # cat /proc/slab_allocators
 0m42.778s # 1st approach
 0m0.737s  # 2nd approach

[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/20190411032635.10325-1-cai@lca.pw
Fixes: d31676dfde ("mm/slab: alternative implementation for DEBUG_SLAB_LEAK")
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Liu Xiang
632b2ef0c7 mm/slub.c: update the comment about slab frozen
Now frozen slab can only be on the per cpu partial list.

Link: http://lkml.kernel.org/r/1554022325-11305-1-git-send-email-liu.xiang6@zte.com.cn
Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Li RongQing
517f9f1ee5 mm/slab.c: remove unneed check in cpuup_canceled
nc is a member of percpu allocation memory, and cannot be NULL.

Link: http://lkml.kernel.org/r/1553159353-5056-1-git-send-email-lirongqing@baidu.com
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Liu Xiang
a4d3f8916c slub: remove useless kmem_cache_debug() before remove_full()
When CONFIG_SLUB_DEBUG is not enabled, remove_full() is empty.
While CONFIG_SLUB_DEBUG is enabled, remove_full() can check
s->flags by itself. So kmem_cache_debug() is useless and
can be removed.

Link: http://lkml.kernel.org/r/1552577313-2830-1-git-send-email-liu.xiang6@zte.com.cn
Signed-off-by: Liu Xiang <liu.xiang6@zte.com.cn>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Tobin C. Harding
3e05617cea mm: remove stale comment from page struct
We now use the slab_list list_head instead of the lru list_head.  This
comment has become stale.

Remove stale comment from page struct slab_list list_head.

Link: http://lkml.kernel.org/r/20190402230545.2929-8-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Tobin C. Harding
16cb0ec75b slab: use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs.  We
have a list in the page structure (slab_list) that can be used for this
purpose.  Doing so makes the code cleaner since we are not overloading the
lru list.

Use the slab_list instead of the lru list for maintaining lists of slabs.

Link: http://lkml.kernel.org/r/20190402230545.2929-7-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
Tobin C. Harding
916ac05278 slub: use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs.  We
have a list in the page structure (slab_list) that can be used for this
purpose.  Doing so makes the code cleaner since we are not overloading the
lru list.

Use the slab_list instead of the lru list for maintaining lists of slabs.

Link: http://lkml.kernel.org/r/20190402230545.2929-6-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Tobin C. Harding
6dfd1b653c slub: add comments to endif pre-processor macros
SLUB allocator makes heavy use of ifdef/endif pre-processor macros.  The
pairing of these statements is at times hard to follow e.g.  if the pair
are further than a screen apart or if there are nested pairs.  We can
reduce cognitive load by adding a comment to the endif statement of form

       #ifdef CONFIG_FOO
       ...
       #endif /* CONFIG_FOO */

Add comments to endif pre-processor macros if ifdef/endif pair is not
immediately apparent.

Link: http://lkml.kernel.org/r/20190402230545.2929-5-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Tobin C. Harding
adab7b6818 slob: use slab_list instead of lru
Currently we use the page->lru list for maintaining lists of slabs.  We
have a list_head in the page structure (slab_list) that can be used for
this purpose.  Doing so makes the code cleaner since we are not
overloading the lru list.

The slab_list is part of a union within the page struct (included here
stripped down):

	union {
		struct {	/* Page cache and anonymous pages */
			struct list_head lru;
			...
		};
		struct {
			dma_addr_t dma_addr;
		};
		struct {	/* slab, slob and slub */
			union {
				struct list_head slab_list;
				struct {	/* Partial pages */
					struct page *next;
					int pages;	/* Nr of pages left */
					int pobjects;	/* Approximate count */
				};
			};
		...

Here we see that slab_list and lru are the same bits.  We can verify that
this change is safe to do by examining the object file produced from
slob.c before and after this patch is applied.

Steps taken to verify:

 1. checkout current tip of Linus' tree

    commit a667cb7a94 ("Merge branch 'akpm' (patches from Andrew)")

 2. configure and build (select SLOB allocator)

    CONFIG_SLOB=y
    CONFIG_SLAB_MERGE_DEFAULT=y

 3. dissasemble object file `objdump -dr mm/slub.o > before.s
 4. apply patch
 5. build
 6. dissasemble object file `objdump -dr mm/slub.o > after.s
 7. diff before.s after.s

Use slab_list list_head instead of the lru list_head for maintaining
lists of slabs.

Link: http://lkml.kernel.org/r/20190402230545.2929-4-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Tobin C. Harding
130e8e09e2 slob: respect list_head abstraction layer
Currently we reach inside the list_head.  This is a violation of the layer
of abstraction provided by the list_head.  It makes the code fragile.
More importantly it makes the code wicked hard to understand.

The code reaches into the list_head structure to counteract the fact that
the list _may_ have been changed during slob_page_alloc().  Instead of
this we can add a return parameter to slob_page_alloc() to signal that the
list was modified (list_del() called with page->lru to remove page from
the freelist).

This code is concerned with an optimisation that counters the tendency for
first fit allocation algorithm to fragment memory into many small chunks
at the front of the memory pool.  Since the page is only removed from the
list when an allocation uses _all_ the remaining memory in the page then
in this special case fragmentation does not occur and we therefore do not
need the optimisation.

Add a return parameter to slob_page_alloc() to signal that the allocation
used up the whole page and that the page was removed from the free list.
After calling slob_page_alloc() check the return value just added and only
attempt optimisation if the page is still on the list.

Use list_head API instead of reaching into the list_head structure to
check if sp is at the front of the list.

Link: http://lkml.kernel.org/r/20190402230545.2929-3-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Tobin C. Harding
a16b538499 list: add function list_rotate_to_front()
Patch series "mm: Use slab_list list_head instead of lru", v5.

Currently the slab allocators (ab)use the struct page 'lru' list_head.  We
have a list head for slab allocators to use, 'slab_list'.

During v2 it was noted by Christoph that the SLOB allocator was reaching
into a list_head, this version adds 2 patches to the front of the set to
fix that.

Clean up all three allocators by using the 'slab_list' list_head instead
of overloading the 'lru' list_head.

This patch (of 7):

Currently if we wish to rotate a list until a specific item is at the
front of the list we can call list_move_tail(head, list).  Note that the
arguments are the reverse way to the usual use of list_move_tail(list,
head).  This is a hack, it depends on the developer knowing how the
list_head operates internally which violates the layer of abstraction
offered by the list_head.  Also, it is not intuitive so the next developer
to come along must study list.h in order to fully understand what is meant
by the call, while this is 'good for' the developer it makes reading the
code harder.  We should have an function appropriately named that does
this if there are users for it intree.

By grep'ing the tree for list_move_tail() and list_tail() and attempting
to guess the argument order from the names it seems there is only one
place currently in the tree that does this - the slob allocatator.

Add function list_rotate_to_front() to rotate a list until the specified
item is at the front of the list.

Link: http://lkml.kernel.org/r/20190402230545.2929-2-tobin@kernel.org
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Shuning Zhang
e091eab028 ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
In some cases, ocfs2_iget() reads the data of inode, which has been
deleted for some reason.  That will make the system panic.  So We should
judge whether this inode has been deleted, and tell the caller that the
inode is a bad inode.

For example, the ocfs2 is used as the backed of nfs, and the client is
nfsv3.  This issue can be reproduced by the following steps.

on the nfs server side,
..../patha/pathb

Step 1: The process A was scheduled before calling the function fh_verify.

Step 2: The process B is removing the 'pathb', and just completed the call
to function dput.  Then the dentry of 'pathb' has been deleted from the
dcache, and all ancestors have been deleted also.  The relationship of
dentry and inode was deleted through the function hlist_del_init.  The
following is the call stack.
dentry_iput->hlist_del_init(&dentry->d_u.d_alias)

At this time, the inode is still in the dcache.

Step 3: The process A call the function ocfs2_get_dentry, which get the
inode from dcache.  Then the refcount of inode is 1.  The following is the
call stack.
nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)

Step 4: Dirty pages are flushed by bdi threads.  So the inode of 'patha'
is evicted, and this directory was deleted.  But the inode of 'pathb'
can't be evicted, because the refcount of the inode was 1.

Step 5: The process A keep running, and call the function
reconnect_path(in exportfs_decode_fh), which call function
ocfs2_get_parent of ocfs2.  Get the block number of parent
directory(patha) by the name of ...  Then read the data from disk by the
block number.  But this inode has been deleted, so the system panic.

Process A                                             Process B
1. in nfsd3_proc_getacl                   |
2.                                        |        dput
3. fh_to_dentry(ocfs2_get_dentry)         |
4. bdi flush dirty cache                  |
5. ocfs2_iget                             |

[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
Invalid dinode #580640: OCFS2_VALID_FL not set

[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
after error

[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G        W
4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
[283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/21/2015
[283465.549657]  0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
ffffffffa0514758
[283465.550392]  000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
0000000000000008
[283465.551056]  ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
ffff88005df9f000
[283465.551710] Call Trace:
[283465.552516]  [<ffffffff816e839c>] dump_stack+0x63/0x81
[283465.553291]  [<ffffffff816e62d3>] panic+0xcb/0x21b
[283465.554037]  [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
[283465.554882]  [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2]
[283465.555768]  [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230
[ocfs2]
[283465.556683]  [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
[283465.557408]  [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20
[ocfs2]
[283465.557973]  [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60
[ocfs2]
[283465.558525]  [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2]
[283465.559082]  [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2]
[283465.559622]  [<ffffffff81297c05>] reconnect_path+0xb5/0x300
[283465.560156]  [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0
[283465.560708]  [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[283465.561262]  [<ffffffff810a8196>] ? prepare_creds+0x26/0x110
[283465.561932]  [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd]
[283465.562862]  [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
[283465.563697]  [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
[283465.564510]  [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd]
[283465.565358]  [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30
[sunrpc]
[283465.566272]  [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc]
[283465.567155]  [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc]
[283465.568020]  [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd]
[283465.568962]  [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd]
[283465.570112]  [<ffffffff810a622b>] kthread+0xcb/0xf0
[283465.571099]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
[283465.572114]  [<ffffffff816f11b8>] ret_from_fork+0x58/0x90
[283465.573156]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180

Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Phillip Potter
9dc2108d66 ocfs2: use common file type conversion
Deduplicate the ocfs2 file type conversion implementation and remove
OCFS2_FT_* definitions - file systems that use the same file types as
defined by POSIX do not need to define their own versions and can use the
common helper functions decared in fs_types.h and implemented in
fs_types.c

Common implementation can be found via bbe7449e25 ("fs: common
implementation of file type").

Link: http://lkml.kernel.org/r/20190326213919.GA20878@pathfinder
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Joseph Qi
3fde8c60b3 MAINTAINERS: add Joseph as ocfs2 co-maintainer
I have been contributing and reviewing to the ocfs2 filesystem for recent
years and I'm willing to continue doing so.  Volunteer as a co-maintainer
for ocfs2 filesystem.

Link: http://lkml.kernel.org/r/f56d75b3-2be5-25c2-51f2-c3f5423d4f14@gmail.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Sabyasachi Gupta
e602b26ce4 arch/sh/boards/mach-dreamcast/irq.c: Remove duplicate header
Remove linux/irq.h which is included more than once.

Link: http://lkml.kernel.org/r/5c8682ef.1c69fb81.5a1ea.2e7f@mx.google.com
Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Acked-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Cyrill Gorcunov
a9e73998f9 kernel/sys.c: prctl: fix false positive in validate_prctl_map()
While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long).  These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.

As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL.  From the image dump of a program:

 | "mm_start_code": "0x400000",
 | "mm_end_code": "0x8f5fb4",
 | "mm_start_data": "0xf1bfb0",
 | "mm_end_data": "0xf1bfb0",

Thus we need to change validate_prctl_map from strictly less to less or
equal operator use.

Link: http://lkml.kernel.org/r/20190408143554.GY1421@uranus.lan
Fixes: f606b77f1a ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Kai Shen
2bf753e64b mm/hugetlb.c: don't put_page in lock of hugetlb_lock
spinlock recursion happened when do LTP test:
#!/bin/bash
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &
./runltp -p -f hugetlb &

The dtor returned by get_compound_page_dtor in __put_compound_page may be
the function of free_huge_page which will lock the hugetlb_lock, so don't
put_page in lock of hugetlb_lock.

 BUG: spinlock recursion on CPU#0, hugemmap05/1079
  lock: hugetlb_lock+0x0/0x18, .magic: dead4ead, .owner: hugemmap05/1079, .owner_cpu: 0
 Call trace:
  dump_backtrace+0x0/0x198
  show_stack+0x24/0x30
  dump_stack+0xa4/0xcc
  spin_dump+0x84/0xa8
  do_raw_spin_lock+0xd0/0x108
  _raw_spin_lock+0x20/0x30
  free_huge_page+0x9c/0x260
  __put_compound_page+0x44/0x50
  __put_page+0x2c/0x60
  alloc_surplus_huge_page.constprop.19+0xf0/0x140
  hugetlb_acct_memory+0x104/0x378
  hugetlb_reserve_pages+0xe0/0x250
  hugetlbfs_file_mmap+0xc0/0x140
  mmap_region+0x3e8/0x5b0
  do_mmap+0x280/0x460
  vm_mmap_pgoff+0xf4/0x128
  ksys_mmap_pgoff+0xb4/0x258
  __arm64_sys_mmap+0x34/0x48
  el0_svc_common+0x78/0x130
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc

Link: http://lkml.kernel.org/r/b8ade452-2d6b-0372-32c2-703644032b47@huawei.com
Fixes: 9980d744a0 ("mm, hugetlb: get rid of surplus page accounting tricks")
Signed-off-by: Kai Shen <shenkai8@huawei.com>
Signed-off-by: Feilong Lin <linfeilong@huawei.com>
Reported-by: Wang Wang <wangwang2@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Dan Williams
fce86ff580 mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses
Starting with c6f3c5ee40 ("mm/huge_memory.c: fix modifying of page
protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls
pmdp_set_access_flags().  That helper enforces a pmd aligned @address
argument via VM_BUG_ON() assertion.

Update the implementation to take a 'struct vm_fault' argument directly
and apply the address alignment fixup internally to fix crash signatures
like:

    kernel BUG at arch/x86/mm/pgtable.c:515!
    invalid opcode: 0000 [#1] SMP NOPTI
    CPU: 51 PID: 43713 Comm: java Tainted: G           OE     4.19.35 #1
    [..]
    RIP: 0010:pmdp_set_access_flags+0x48/0x50
    [..]
    Call Trace:
     vmf_insert_pfn_pmd+0x198/0x350
     dax_iomap_fault+0xe82/0x1190
     ext4_dax_huge_fault+0x103/0x1f0
     ? __switch_to_asm+0x40/0x70
     __handle_mm_fault+0x3f6/0x1370
     ? __switch_to_asm+0x34/0x70
     ? __switch_to_asm+0x40/0x70
     handle_mm_fault+0xda/0x200
     __do_page_fault+0x249/0x4f0
     do_page_fault+0x32/0x110
     ? page_fault+0x8/0x30
     page_fault+0x1e/0x30

Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: c6f3c5ee40 ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Piotr Balcer <piotr.balcer@intel.com>
Tested-by: Yan Ma <yan.ma@intel.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Chandan Rajendra <chandan@linux.ibm.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:44 -07:00
Linus Torvalds
a13f065550 IOMMU Updates for Linux v5.2
Including:
 
 	- ATS support for ARM-SMMU-v3.
 
 	- AUX domain support in the IOMMU-API and the Intel VT-d driver.
 	  This adds support for multiple DMA address spaces per
 	  (PCI-)device. The use-case is to multiplex devices between
 	  host and KVM guests in a more flexible way than supported by
 	  SR-IOV.
 
 	- The Rest are smaller cleanups and fixes, two of which needed
 	  to be reverted after testing in linux-next.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAlzZWPkACgkQK/BELZcB
 GuPdRRAAj/RcgVn7fqmNDM02xe6C5PuwBGYkXnC+atDrTQWbFsM0JE3YTWEHJ+66
 7RMoYaksRaSBsn3QuX3b6+g6E+exhGoQ0BfkmuF8StUXAsaxvzGxvuk+cP0o4/mK
 pZkj3BddS4ycRqQPsVEbgJGRzL39dxWHe7p3/FfwgV+HzVonURFozU0HixLAoBhr
 uS0LpBiG8uGCMvO6yhTmPmfrbsSAcMivb7LlmsaykXPhjBk7kSqNgHNNx5O+HC8m
 XJdFatkxolkrN6A2FoHdP05sAXCv+uHbAGGGitYziRaXG7GBzm7Vc2LspJIml+y2
 898+MiTH1M3P0WPyDa3cfcnRc2BBuJg56emad4CcfduM9sVXI0Ol6slNAYljnSYD
 5A0CUxbrLxGUZaf6DAUJ9w5L+LhgEkXzKWEE9Nif46K4I1CFSt/d8nwB6Q5Oc/ie
 GZwTICRkMwTeqOM/CTyvwJCCwZm47AVv3qwaI0z5oDplH/bbRmNEi5WFJsgcgOnd
 GS5kmzjFBsljjDVWswgugdm7sdMSl7y88uQK9zUiG8fXgRiVUW/rENfZ1SMmVl1p
 zBQDndZmtrHm5ybe/NAZ8vaJhk4i1F3rWT0hwRZZKGDIrd/C3egnNyYkc4XeTPGe
 3il+dJleIIwOX5Fpa44XTV1rDuVOXpF5LS5NRLjhhd+XqbaXZFI=
 =HLtu
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - ATS support for ARM-SMMU-v3.

 - AUX domain support in the IOMMU-API and the Intel VT-d driver. This
   adds support for multiple DMA address spaces per (PCI-)device. The
   use-case is to multiplex devices between host and KVM guests in a
   more flexible way than supported by SR-IOV.

 - the rest are smaller cleanups and fixes, two of which needed to be
   reverted after testing in linux-next.

* tag 'iommu-updates-v5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (45 commits)
  Revert "iommu/amd: Flush not present cache in iommu_map_page"
  Revert "iommu/amd: Remove the leftover of bypass support"
  iommu/vt-d: Fix leak in intel_pasid_alloc_table on error path
  iommu/vt-d: Make kernel parameter igfx_off work with vIOMMU
  iommu/vt-d: Set intel_iommu_gfx_mapped correctly
  iommu/amd: Flush not present cache in iommu_map_page
  iommu/vt-d: Cleanup: no spaces at the start of a line
  iommu/vt-d: Don't request page request irq under dmar_global_lock
  iommu/vt-d: Use struct_size() helper
  iommu/mediatek: Fix leaked of_node references
  iommu/amd: Remove amd_iommu_pd_list
  iommu/arm-smmu: Log CBFRSYNRA register on context fault
  iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel
  iommu/arm-smmu-v3: Disable tagged pointers
  iommu/arm-smmu-v3: Add support for PCI ATS
  iommu/arm-smmu-v3: Link domains and devices
  iommu/arm-smmu-v3: Add a master->domain pointer
  iommu/arm-smmu-v3: Store SteamIDs in master
  iommu/arm-smmu-v3: Rename arm_smmu_master_data to arm_smmu_master
  ACPI/IORT: Check ATS capability in root complex nodes
  ...
2019-05-13 09:23:18 -04:00
Linus Torvalds
55472bae53 linux-watchdog 5.2-rc1 tag
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iEYEABECAAYFAlzZRp0ACgkQ+iyteGJfRsq1WQCfaR8jQjujN1bGWghnSWbTr01X
 O3cAn3RmJWiEtI0zmmMcRQBna8nj4cHJ
 =PuHw
 -----END PGP SIGNATURE-----

Merge tag 'linux-watchdog-5.2-rc1' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog updates from Wim Van Sebroeck:

 - a new watchdog driver for the ROHM BD70528 watchdog block

 - a new watchdog driver for the i.MX system controller watchdog

 - conversions to use device managed functions and other improvements

 - refactor watchdog_init_timeout

 - make watchdog core configurable as module

 - pretimeout governors improvements

 - a lot of other fixes

* tag 'linux-watchdog-5.2-rc1' of git://www.linux-watchdog.org/linux-watchdog: (114 commits)
  watchdog: Enforce that at least one pretimeout governor is enabled
  watchdog: stm32: add dynamic prescaler support
  watchdog: Improve Kconfig entry ordering and dependencies
  watchdog: npcm: Enable modular builds
  watchdog: Make watchdog core configurable as module
  watchdog: Move pretimeout governor configuration up
  watchdog: Use depends instead of select for pretimeout governors
  watchdog: rtd119x: drop unused module.h include
  watchdog: intel_scu: make it explicitly non-modular
  watchdog: coh901327: make it explicitly non-modular
  watchdog: ziirave_wdt: drop warning after calling watchdog_init_timeout
  watchdog: xen_wdt: drop warning after calling watchdog_init_timeout
  watchdog: stm32_iwdg: drop warning after calling watchdog_init_timeout
  watchdog: st_lpc_wdt: drop warning after calling watchdog_init_timeout
  watchdog: sp5100_tco: drop warning after calling watchdog_init_timeout
  watchdog: renesas_wdt: drop warning after calling watchdog_init_timeout
  watchdog: nic7018_wdt: drop warning after calling watchdog_init_timeout
  watchdog: ni903x_wdt: drop warning after calling watchdog_init_timeout
  watchdog: imx_sc_wdt: drop warning after calling watchdog_init_timeout
  watchdog: i6300esb: drop warning after calling watchdog_init_timeout
  ...
2019-05-13 09:20:42 -04:00
Linus Torvalds
d7a02fa0a8 This pull request contains the following changes for UBI/UBIFS
- fscrypt framework usage updates
 - One huge fix for xattr unlink
 - Cleanup of fscrypt ifdefs
 - Fix for our new UBIFS auth feature
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlzYkIgWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTRrD/99iBd4f8F0jF1wmB8/9kDAnz5s
 KaK+VtC0RVRijRijYzo+/2kDXpXEbmPycg6AVl5EfKxXCVFw1K7pQvuBX43qyv4o
 BINRv1av8FEBA9eTjvBgZJUrjB1AuvV37716/OeM2bnvuCsp1escnvTEh6S3VFYw
 oWDBgZJd+DE10CYtZjuLoyDPcYdNrzebbmu3Xbfl2XsPwZFUJIrymMd6NE8Xdk3I
 EQbZ3guEM5Djui+nrko3iKzfoZ4eK7WguO3DOEjUHpwea4ZfnZtnlH345aYOAqRE
 N5qrDCzXOsWs6Zs+clODMQgg+aTN3kGBNV534culcpMAbUp7WXynUQ1DDqtOJNJO
 pGFjhAfGi4E6YgB3UwqxMbXxI4Tg/X2ckc77hWZlC7h/1Y/i89nacT6Ij5rPNOn1
 mby1mFxWHI04uSEICWyocFK4m/J2b17Tmte2Mc5ZOigQqREUB7J8wiT4NWm6GhV1
 nTb5DA8MepC3zopbsL/iAiKPhSkH1h6AkabBw1ADTksacgNUfhjzALkxqa64tIqv
 C43QG3n/HqsNZJ4aLdizLLb8KIt4pWsIaqHOeDGSfr3I1GEBrpfKiR72P/h3fSF9
 9GIFJU5HiV+3zeAC2024muaV7KjcimZ6t/hPFTCFH9pMGNk2Mtn/gZFfmqnjLKbj
 TDxUTrZF9Lujonrbwg==
 =ymCJ
 -----END PGP SIGNATURE-----

Merge tag 'upstream-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI/UBIFS updates from Richard Weinberger:

 - fscrypt framework usage updates

 - One huge fix for xattr unlink

 - Cleanup of fscrypt ifdefs

 - Fix for our new UBIFS auth feature

* tag 'upstream-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: wl: Fix uninitialized variable
  ubifs: Drop unnecessary setting of zbr->znode
  ubifs: Remove ifdefs around CONFIG_UBIFS_ATIME_SUPPORT
  ubifs: Remove #ifdef around CONFIG_FS_ENCRYPTION
  ubifs: Limit number of xattrs per inode
  ubifs: orphan: Handle xattrs like files
  ubifs: journal: Handle xattrs like files
  ubifs: find.c: replace swap function with built-in one
  ubifs: Do not skip hash checking in data nodes
  ubifs: work around high stack usage with clang
  ubifs: remove unused function __ubifs_shash_final
  ubifs: remove unnecessary #ifdef around fscrypt_ioctl_get_policy()
  ubifs: remove unnecessary calls to set up directory key
2019-05-12 18:16:31 -04:00
Linus Torvalds
4dbf09fea6 This pull request contains the following changes for MTD:
MTD core changes:
 - New AFS partition parser
 - Update MAINTAINERS entry
 - Use of fall-throughs markers
 
 NAND core changes:
 - Support having the bad block markers in either the first, second or
   last page of a block. The combination of all three location is now
   possible.
 - Constification of NAND_OP_PARSER(_PATTERN) elements.
 - Generic NAND DT bindings changed to yaml format (can be used to
   check the proposed bindings. First platform to be fully supported:
   sunxi.
 - Stopped using several legacy hooks.
 - Preparation to use the generic NAND layer with the addition of
   several helpers and the removal of the struct nand_chip from generic
   functions.
 - Kconfig cleanup to prepare the introduction of external ECC engines
   support.
 - Fallthrough comments.
 - Introduction of the SPI-mem dirmap API for SPI-NAND devices.
 
 Raw NAND controller drivers changes:
 - nandsim:
   * Switch to ->exec-op().
 - meson:
   * Misc cleanups and fixes.
   * New OOB layout.
 - Sunxi:
   * A23/A33 NAND DMA support.
 - Ingenic:
   * Full reorganization and cleanup.
   * Clear separation between NAND controller and ECC engine.
   * Support JZ4740 an JZ4725B.
 - Denali:
   * Clear controller/chip separation.
   * ->exec_op() migration.
   * Various cleanups.
 - fsl_elbc:
   * Enable software ECC support.
 - Atmel:
   * Sam9x60 support.
 - GPMI:
   * Introduce the GPMI_IS_MXS() macro.
 - Various trivial/spelling/coding style fixes.
 
 SPI NOR core changes:
 - Print all JEDEC ID bytes on error
 - Fix comment of spi_nor_find_best_erase_type()
 - Add region locking flags for s25fl512s
 
 SPI NOR controller drivers changes:
 - intel-spi:
   * Avoid crossing 4K address boundary on read/write
   * Add support for Intel Comet Lake SPI serial flash
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlzYiU4WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wX1HEACay8s/mUEWcLO3JKWy6KiC3756
 1CGB3p5b621kKP6ooPWvV7UAv1Q2IKkLIwKaLE5W5FuKW9bVnN6H/yejVT8vYPK9
 /5AbcqbdNKfrnYBnfv3SHH8jSYo6HjwwNsF7OcR/yiXvk/JUFX+VJQdR01HEzz+Z
 TWzkm4n5+vat5pJSGBs7JwRBlatuiCHul7Lz2dZYkF/ZdGIQgL5ftOr1goLsr88+
 Hxn7Wmp3eBVZbQMf83BD7wf/Nv+oycToKBqklMZqMBEgK5mT6WDkT65HG4XMfzMz
 0CcPReMHlTZVqJHHZFgTSXVPJJHu8Nl4qmJIAaf1hnmvx7yFW6LD0C1zKpu6uwRm
 +qVpe/fTDArLCEwLouLND6Y9MC7kkERkDE3jwcwSQ/PZcE3kdHKwIhmJ/19utI8k
 zk9pWGAWvtuoY1b+dNFxT4YcUxrHOWSxYcUZHcZvQHQr7Bvxskg92P1fOU0wlgC/
 tXRtXUNCB5YsUU5x8Ph6+786dsCMcwCDoQQzwegecrbc6sK7n3KSYAcoNfv5ATwI
 C+Myoawul/XsxQvUyYbDIr8T4Yyda1BLs92XHxg1Di3kTC2m0OZL8sWJboQ7I/CI
 GkiJm5hFvzwniE+yrqE4n4jnCkoP5Y4kRtX70VDK3pIVDZFPs93lgYaYTFcfp93G
 scfn1MoI/bE7jDzpbA==
 =HXap
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux

Pull MTD updates from Richard Weinberger:
 "MTD core changes:
   - New AFS partition parser
   - Update MAINTAINERS entry
   - Use of fall-throughs markers

  NAND core changes:
   - Support having the bad block markers in either the first, second or
     last page of a block. The combination of all three location is now
     possible.
   - Constification of NAND_OP_PARSER(_PATTERN) elements.
   - Generic NAND DT bindings changed to yaml format (can be used to
     check the proposed bindings. First platform to be fully supported:
     sunxi.
   - Stopped using several legacy hooks.
   - Preparation to use the generic NAND layer with the addition of
     several helpers and the removal of the struct nand_chip from
     generic functions.
   - Kconfig cleanup to prepare the introduction of external ECC engines
     support.
   - Fallthrough comments.
   - Introduction of the SPI-mem dirmap API for SPI-NAND devices.

  Raw NAND controller drivers changes:
   - nandsim:
      - Switch to ->exec-op().
   - meson:
      - Misc cleanups and fixes.
      - New OOB layout.
   - Sunxi:
      - A23/A33 NAND DMA support.
   - Ingenic:
      - Full reorganization and cleanup.
      - Clear separation between NAND controller and ECC engine.
      - Support JZ4740 an JZ4725B.
   - Denali:
      - Clear controller/chip separation.
      - ->exec_op() migration.
      - Various cleanups.
   - fsl_elbc:
      - Enable software ECC support.
   - Atmel:
      - Sam9x60 support.
   - GPMI:
      - Introduce the GPMI_IS_MXS() macro.
   - Various trivial/spelling/coding style fixes.

  SPI NOR core changes:
   - Print all JEDEC ID bytes on error
   - Fix comment of spi_nor_find_best_erase_type()
   - Add region locking flags for s25fl512s

  SPI NOR controller drivers changes:
   - intel-spi:
      - Avoid crossing 4K address boundary on read/write
      - Add support for Intel Comet Lake SPI serial flash"

* tag 'mtd/for-5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (120 commits)
  mtd: part: fix incorrect format specifier for an unsigned long long
  mtd: lpddr_cmds: Mark expected switch fall-through
  mtd: phram: Mark expected switch fall-throughs
  mtd: cfi_cmdset_0002: Mark expected switch fall-throughs
  mtd: cfi_util: mark expected switch fall-throughs
  MAINTAINERS: MTD Git repository is hosted on kernel.org
  MAINTAINERS: Update jffs2 entry
  mtd: afs: add v2 partition parsing
  mtd: afs: factor the IIS read into partition parser
  mtd: afs: factor footer parsing into the v1 part parsing
  mtd: factor out v1 partition parsing
  mtd: afs: simplify partition detection
  mtd: afs: simplify partition parsing
  mtd: partitions: Add OF support to AFS partitions
  mtd: partitions: Add AFS partitions DT bindings
  mtd: afs: Move AFS partition parser to parsers subdir
  mtd: maps: Make uclinux_ram_map static
  mtd: maps: Allow MTD_PHYSMAP with MTD_RAM
  MAINTAINERS: Add myself as MTD maintainer
  MAINTAINERS: Remove my name from the MTD and NAND entries
  ...
2019-05-12 17:57:52 -04:00
Linus Torvalds
983dfa4b6e This pull request contains the following changes for UML:
- Kconfig cleanups
 - Fix cpu_all_mask() usage
 - Various bug fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlzYi30WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wdDcD/wLx0xljjSb+j08VVSvVWGah1Vl
 DMVyLp1Eik8KRnc6vR+IfC6qDE2+QmJvcLLx4IQ8wpgce+mvhLSy0+8SNsU9tz7t
 7ZYVR++L3If3dx72J1aJquQt4PNLQn7QAdPWOA/FiYy4mqjxZUg4HVwf/Oge/2Un
 jfom649xl1gdcYlXTCOadb4Xmqo1BSEW+Ms1zqrQlBpU6ePMvojPkjBMdaCbCjMg
 bLt4XjtVbgBH3FnH0ZvuDzrMW229LiLot4KF0iUW36/gV/ZRATbinst5AQ5mUsMP
 GgrqbeU+wDdzt73p/l1NG7u3DZHOhoAW1ZWTqwBMKiazQiJPa90V9TIOwbnSl7zc
 hBEKKkU/u6p5E5TADcTty9ZJfCM+3Zatqt004WSbi+ug363G08XrTb3wWz6AruQ/
 9shTUmzwYsK1Bzllf2T2WShBrN+vMdmpzf4+v66N1KhcPrb7Eh81N/VhQG+rvfSb
 Ju/lDhu6OxlHr9OlGinI0SCLgjpk3qWcNd1noFdQsTewIopQsOL6H4R7711md3ow
 PWl7HAspvCRD3ub12y0wS3bb/4AUyoBrMDT/VBfk2vH0BbCzlR/ckaKE+lk2Y2Mr
 BpURt1zcqnpqi5LqRC//dhCFPyzpXd+yYVy1P6bN8q5lvfuIoaRdl2YeWjMfoo0v
 r+loEdGNa57Qj67ncg==
 =HB9o
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML updates from Richard Weinberger:

 - Kconfig cleanups

 - Fix cpu_all_mask() usage

 - Various bug fixes

* tag 'for-linus-5.2-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: irq: don't set the chip for all irqs
  um: define set_pte_at() as a static inline function, not a macro
  um: remove uses of variable length arrays
  um: remove unused variable
  uml: fix a boot splat wrt use of cpu_all_mask
  um: Do not unlock mutex that is not hold.
  hostfs: fix mismatch between link_file definition and declaration
  arch: um: drivers: Kconfig: pedantic formatting
  arch: um: Kconfig: pedantic indention cleanups
  um: Revert to using stack for pt_regs in signal handling
2019-05-12 17:52:13 -04:00
Linus Torvalds
47782361ac chrome platform changes for v5.2
CrOS EC:
 
 - Add EC host command support using rpmsg
 - Add new CrOS USB PD logging driver
 - Transfer spi messages at high priority
 - Add support to trace CrOS EC commands
 - Minor fixes and cleanups in protocol and debugfs
 
 Wilco EC:
 
 - Standardize Wilco EC mailbox interface
 - Add h1_gpio status to debugfs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6gYDF28Li+nEiKLaHwn1ewov5lgFAlzV61YACgkQHwn1ewov
 5lgEFw//S7GVmBrFxcqu5wAjF1CW+mOGi3y6nVuTAHanWG/hJGWT+itOwsdDp6c9
 TggYgohprz64JAZOPqPCTonV/qbxgsfKrSQRxFDtHH4F1iEUF46fnlsULDKi8VwM
 Qzj4g4d//ePsOwHOsYVrbJRU2qKyF6Rm2hpOxKfI9u2Dv5fxLFu6fxUhrSq1Inr6
 U67j7pxBwOnBtN2A6hMKHZaOUVkSNYT6azSPO3Z2YH0aky2Baxw/LPoRnbCNhwUQ
 iyneX5+K0wpCz2fpnBF/QSh1QBACeyfrO6HHA+flfaejhShaWttrS36Gar+sdHFN
 p6eeR1CoEJZbRY79Eetj8Cv5Be1ivVG/SC5JF4O1apAAn87wXLI6AaLG/03ul0vc
 KOkcjrXMxISRlAUr+OKD0rg3Uo2oI0ht70XMT9DDsCRNDoVHvkDQJNdkWrKq+E1c
 xL4YeLofZpcEN+Oe/WnwUZtYUdY3qcWs+C4hV+h0L0Ke5xir25DEUfF3j3J/uK2B
 JEgkTpH8j6YjbGAErBPkTxWt5HE3oWtkK4moPlrfPKfxoSo2eRDvqz68qHsgIn8p
 WBM+FSr+dQ7qyYDigMKrFSesiBpwCBI4lIgPxkvTxqbubaoZcsABHm3BUGjykXII
 E5z2qsgRnDrB+uGGDkTvDoR0Kr3U0hGlag7u/N61H86PoiMLUig=
 =NF5N
 -----END PGP SIGNATURE-----

Merge tag 'tag-chrome-platform-for-v5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform updates from Benson Leung:
 "CrOS EC:
   - Add EC host command support using rpmsg
   - Add new CrOS USB PD logging driver
   - Transfer spi messages at high priority
   - Add support to trace CrOS EC commands
   - Minor fixes and cleanups in protocol and debugfs

  Wilco EC:
   - Standardize Wilco EC mailbox interface
   - Add h1_gpio status to debugfs"

* tag 'tag-chrome-platform-for-v5.2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_proto: Add trace event to trace EC commands
  platform/chrome: cros_ec_debugfs: Use cros_ec_cmd_xfer_status helper
  platform/chrome: cros_ec: Add EC host command support using rpmsg
  platform/chrome: wilco_ec: Add h1_gpio status to debugfs
  platform/chrome: wilco_ec: Standardize mailbox interface
  platform/chrome: cros_ec_proto: check for NULL transfer function
  platform/chrome: Add CrOS USB PD logging driver
  platform/chrome: cros_ec_spi: Transfer messages at high priority
  platform/chrome: cros_ec_debugfs: no need to check return value of debugfs_create functions
  platform/chrome: cros_ec_debugfs: Remove dev_warn when console log is not supported
2019-05-12 07:00:21 -04:00
Linus Torvalds
8148c17b17 This is the bulk of the GPIO changes for the v5.2 kernel cycle:
Core changes:
 - The gpiolib MMIO driver has been enhanced to handle two direction
   registers, i.e. one register to set lines as input and one register
   to set lines as output. It turns out some silicon engineer thinks
   the ability to configure a line as input and output at the same
   time makes sense, this can be debated but includes a lot of analog
   electronics reasoning, and the registers are there and need to
   be handled consistently. Unsurprisingly, we enforce the lines to
   be either inputs or outputs in such schemes.
 - Send in the proper argument value to .set_config() dispatched to
   the pin control subsystem. Nobody used it before, now someone
   does, so fix it to work as expected.
 - The ACPI gpiolib portions can now handle pin bias setting (pull up
   or pull down). This has been in the ACPI spec for years and we
   finally have it properly integrated with Linux GPIOs. It was based
   on an observation from Andy Schevchenko that Thomas Petazzoni's
   changes to the core for biasing the PCA950x GPIO expander actually
   happen to fit hand-in-glove with what the ACPI core needed.
   Such nice synergies happen sometimes.
 
 New drivers:
 - A new driver for the Mellanox BlueField GPIO controller. This is
   using 64bit MMIO registers and can configure lines as inputs
   and outputs at the same time and after improving the MMIO library
   we handle it just fine. Interesting.
 - A new IXP4xx proper gpiochip driver with hierarchical interrupts
   should be coming in from the ARM SoC tree as well.
 
 Driver enhancements:
 - The PCA053x driver handles the CAT9554 GPIO expander.
 - The PCA053x driver handles the NXP PCAL6416 GPIO expander.
 - Wake-up support on PCA053x GPIO lines.
 - OMAP now does a nice asynchronous IRQ handling on wake-ups by
   letting everything wake up on edges, and this makes runtime PM
   work as expected too.
 
 Misc:
 - Several cleanups such as devres fixes.
 - Get rid of some languager comstructs that cause problems when
   compiling with LLVMs clang.
 - Documentation review and update.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJc1olZAAoJEEEQszewGV1zEU4P/RmTf3hG8xmNPS3MDTmR6gAy
 /YJOXjXBf3CD/dmEAyyaNLnUQismrtRNvHSoEGbno7gkU+htzp9UfUJkj6+HIXs2
 RpF+Hi78HzZNDxGWuBLu6OZolpmBtx+sRKOhHk/XfNS45qd1FgXWDuulzsYa9Xsr
 hYMXdtdv9wY/vcc68q1rtKAbzlu5ZNCa3Zj1iNOr/XQt3Nl2BW66hGLgjK4mOvgx
 fJy4rFXuDIMfDvo69U1Opz2b39sfE7XMhfZS/MOgg4yEV9zGRgDoI1tyMcTqGb8Q
 8LQbp5dXkP+3dJQB8tgbu3Vk4WC1Rd/pmIli5sMgsk0HYQ6XegfT6HJKozSmwN9r
 0s8jKlrocWZvdPo1aJwQgtRS56t2rFWcrcRye8bLqxkkW5cYIq9CwkE8USwB31Kv
 PFpoOwRuCtj0gkCxf7WIEcC5NAkYPow3K1KPdk3E0Si6I3pj0NqqlaAD0JAlkC2V
 aPq3xbTuFCAdmcADEt2Z+dUJ7WIs5Y9oQgosMAx+A2AD4K3QDBMu3pZsT6SCu4XZ
 mK0eWJi9/CvOj/s7bA0BEJVxQA+p8KYsNRBOULg/8aAOqGcLnSydQjqrxDTE8YrL
 xmmRG7i7ht0B9CchZuIB5hqdvjbCgvcVa5OnCUDfLxE0GdCx8iJ9y9OrsMXbabYq
 8FcPDo1N38cTYLnLqvKI
 =rhto
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull gpio updates from Linus Walleij:
 "This is the bulk of the GPIO changes for the v5.2 kernel cycle. A bit
  later than usual because I was ironing out my own mistakes. I'm
  holding some stuff back for the next kernel as a result, and this
  should be a healthy and well tested batch.

  Core changes:

   - The gpiolib MMIO driver has been enhanced to handle two direction
     registers, i.e. one register to set lines as input and one register
     to set lines as output. It turns out some silicon engineer thinks
     the ability to configure a line as input and output at the same
     time makes sense, this can be debated but includes a lot of analog
     electronics reasoning, and the registers are there and need to be
     handled consistently. Unsurprisingly, we enforce the lines to be
     either inputs or outputs in such schemes.

   - Send in the proper argument value to .set_config() dispatched to
     the pin control subsystem. Nobody used it before, now someone does,
     so fix it to work as expected.

   - The ACPI gpiolib portions can now handle pin bias setting (pull up
     or pull down). This has been in the ACPI spec for years and we
     finally have it properly integrated with Linux GPIOs. It was based
     on an observation from Andy Schevchenko that Thomas Petazzoni's
     changes to the core for biasing the PCA950x GPIO expander actually
     happen to fit hand-in-glove with what the ACPI core needed. Such
     nice synergies happen sometimes.

  New drivers:

   - A new driver for the Mellanox BlueField GPIO controller. This is
     using 64bit MMIO registers and can configure lines as inputs and
     outputs at the same time and after improving the MMIO library we
     handle it just fine. Interesting.

   - A new IXP4xx proper gpiochip driver with hierarchical interrupts
     should be coming in from the ARM SoC tree as well.

  Driver enhancements:

   - The PCA053x driver handles the CAT9554 GPIO expander.

   - The PCA053x driver handles the NXP PCAL6416 GPIO expander.

   - Wake-up support on PCA053x GPIO lines.

   - OMAP now does a nice asynchronous IRQ handling on wake-ups by
     letting everything wake up on edges, and this makes runtime PM work
     as expected too.

  Misc:

   - Several cleanups such as devres fixes.

   - Get rid of some languager comstructs that cause problems when
     compiling with LLVMs clang.

   - Documentation review and update"

* tag 'gpio-v5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (85 commits)
  gpio: Update documentation
  docs: gpio: convert docs to ReST and rename to *.rst
  gpio: sch: Remove write-only core_base
  gpio: pxa: Make two symbols static
  gpiolib: acpi: Respect pin bias setting
  gpiolib: acpi: Add acpi_gpio_update_gpiod_lookup_flags() helper
  gpiolib: acpi: Set pin value, based on bias, more accurately
  gpiolib: acpi: Change type of dflags
  gpiolib: Introduce GPIO_LOOKUP_FLAGS_DEFAULT
  gpiolib: Make use of enum gpio_lookup_flags consistent
  gpiolib: Indent entry values of enum gpio_lookup_flags
  gpio: pca953x: add support for pca6416
  dt-bindings: gpio: pca953x: document the nxp,pca6416
  gpio: pca953x: add pcal6416 to the of_device_id table
  gpio: gpio-omap: Remove conditional pm_runtime handling for GPIO interrupts
  gpio: gpio-omap: configure edge detection for level IRQs for idle wakeup
  tracing: stop making gpio tracing configurable
  gpio: pca953x: Configure wake-up path when wake-up is enabled
  gpio: of: Optimize quirk checks
  gpio: mmio: Drop bgpio_dir_inverted
  ...
2019-05-11 10:54:43 -04:00
Linus Torvalds
6fe567df04 VFIO updates for v5.2-rc1
- Improve dev_printk() usage (Bjorn Helgaas)
 
  - Fix issue with blocking in !TASK_RUNNING state while waiting for
    userspace to release devices (Farhan Ali)
 
  - Fix error path cleanup in nvlink setup (Greg Kurz)
 
  - mdev-core cleanups and fixes in preparation for more use cases
    (Parav Pandit)
 
  - Cornelia has volunteered as an official vfio reviewer (Cornelia Huck)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJc1e9mAAoJECObm247sIsiB+sP/Rs94smfIyO/N3a73gd3KrAV
 wmnNGLbUGEEoumUmzWYjSq/l3zfehYpe2kiKJklH8sUPP+kGL0eQj2H5++/LWC3E
 EMMPGVoD0wHHoWwdVKY79xjDIUUeNZe2IFVUjLgwJ9UD79DBBGJUMpWQhbuUDkl1
 nGCb0ltzu6H+OzELLZxXSC7QdhnB97mRaamSI2sunTM7tr6QaL13YsrFES4mpj23
 vIGElbdpyPBeMMbh2rhhb581RDXEv5GCy6SKfBpHOMay4rpr37YX8CJ/7uh4rNcn
 DF3aikkK2NpVv5Rk1+AJvvri+MDmOF9TMC5EG24swEAqJrr4jYyWXvPb/WKNDm2w
 Z0qxBHkZLPJ9kARMQxuAAqJ5vKDwy/FgRjoZi0aEsOjJO+HYeCdIKkueFXdWXw2O
 pL5IdZr5VSejYdVxjV2Ft6y90dQjxIAdDd6QJDnuEAu2JEb2T1q9iea/QOMnEJyD
 QM3h1mx/rNZnkmEVgpE4t9TGnoPMmg/grzcfu+8wQZk8ys1uqSfBSdgWoBgheQ9z
 XJDHCvkRG7bc/VTVcet+HPBvK38Kdv0Er+8eHNmG4c11ifgODzShbXl5oKgDE3iC
 WJi/ilYVn2dleo/4ZqiCP+U/PEVgED4k4pvj0vWhaE7CfGJDC60Te2/q+aB4sTMI
 4EChWOml/T545Hzv6swn
 =JSGI
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v5.2-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Improve dev_printk() usage (Bjorn Helgaas)

 - Fix issue with blocking in !TASK_RUNNING state while waiting for
   userspace to release devices (Farhan Ali)

 - Fix error path cleanup in nvlink setup (Greg Kurz)

 - mdev-core cleanups and fixes in preparation for more use cases (Parav
   Pandit)

 - Cornelia has volunteered as an official vfio reviewer (Cornelia Huck)

* tag 'vfio-v5.2-rc1' of git://github.com/awilliam/linux-vfio:
  vfio: Add Cornelia Huck as reviewer
  vfio/mdev: Avoid inline get and put parent helpers
  vfio/mdev: Fix aborting mdev child device removal if one fails
  vfio/mdev: Follow correct remove sequence
  vfio/mdev: Avoid masking error code to EBUSY
  vfio/mdev: Drop redundant extern for exported symbols
  vfio/mdev: Removed unused kref
  vfio/mdev: Avoid release parent reference during error path
  vfio-pci/nvlink2: Fix potential VMA leak
  vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"
  vfio: Use dev_printk() when possible
2019-05-11 10:47:46 -04:00
Linus Torvalds
c367dc8d0d Merge branch 'next-tomoyo2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull tomoyo updates from James Morris:
 "Fixes to enable fuzz testing, and a fix for calculating whether a
  filesystem is user-modifiable"

* 'next-tomoyo2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  tomoyo: Don't emit WARNING: string while fuzzing testing.
  tomoyo: Change pathname calculation for read-only filesystems.
  tomoyo: Check address length before reading address family
  tomoyo: Add a kernel config option for fuzzing testing.
2019-05-11 10:38:59 -04:00