Commit Graph

53744 Commits

Author SHA1 Message Date
SeongJae Park
0d16cfd46b Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Some descriptions of page flags in 'pagemap.rst' are written in
assumption of none-rst, which respects every new line, as below:

    7 - SLAB
       page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
       When compound page is used, SLUB/SLQB will only set this flag on the head

Because rst ignores the new line between the first sentence and second
sentence, resulting html looks a little bit weird, as below.

    7 - SLAB
    page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator When
                                                                       ^
    compound page is used, SLUB/SLQB will only set this flag on the head
    page; SLOB will not flag it at all.

This change makes it more natural and consistent with other parts in the
rendered version.

Link: https://lkml.kernel.org/r/20211022090311.3856-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:46 -07:00
SeongJae Park
b1eee3c548 Docs/admin-guide/mm/damon/start: simplify the content
Information in 'TL; DR' section of 'Getting Started' is duplicated in
other parts of the doc.  It is also asking readers to visit the access
pattern visualizations gallery web site to show the results of example
visualization commands, while the users of the commands can use terminal
output.

To make the doc simple, this removes the duplicated 'TL; DR' section and
replaces the visualization example commands with versions using terminal
outputs.

Link: https://lkml.kernel.org/r/20211022090311.3856-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:46 -07:00
SeongJae Park
49ce7dee10 Docs/admin-guide/mm/damon/start: fix a wrong link
The 'Getting Started' of DAMON is providing a link to DAMON's user
interface document while saying about its user space tool's detailed
usages.  This fixes the link.

Link: https://lkml.kernel.org/r/20211022090311.3856-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:46 -07:00
SeongJae Park
82e3fff55d Docs/admin-guide/mm/damon/start: fix wrong example commands
Patch series "Fix trivial nits in Documentation/admin-guide/mm".

This patchset fixes trivial nits in admin guide documents for DAMON and
pagemap.

This patch (of 4):

Some of the example commands in DAMON getting started guide are
outdated, missing sudo, or just wrong.  This fixes those.

Link: https://lkml.kernel.org/r/20211022090311.3856-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:46 -07:00
SeongJae Park
bec976b691 Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
This adds an admin-guide document for DAMON-based Reclamation.

Link: https://lkml.kernel.org/r/20211019150731.16699-16-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:46 -07:00
SeongJae Park
c638072107 Docs/DAMON: document physical memory monitoring support
This updates the DAMON documents for the physical memory address space
monitoring support.

Link: https://lkml.kernel.org/r/20211012205711.29216-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:45 -07:00
SeongJae Park
c2fe4987ed Docs/admin-guide/mm/damon: document 'init_regions' feature
This adds description of the 'init_regions' feature in the DAMON usage
document.

Link: https://lkml.kernel.org/r/20211012205711.29216-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:44 -07:00
SeongJae Park
68536f8e01 Docs/admin-guide/mm/damon: document DAMON-based Operation Schemes
This adds the description of DAMON-based operation schemes in the DAMON
documents.

Link: https://lkml.kernel.org/r/20211001125604.29660-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rienjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Markus Boehme <markubo@amazon.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:44 -07:00
SeongJae Park
876d0aac2e docs/vm/damon: remove broken reference
Building DAMON documents warns for a reference to nonexisting doc, as
below:

    $ time make htmldocs
    [...]
    Documentation/vm/damon/index.rst:24: WARNING: toctree contains reference to nonexisting document 'vm/damon/plans'

This fixes the warning by removing the wrong reference.

Link: https://lkml.kernel.org/r/20210917123958.3819-4-sj@kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:44 -07:00
SeongJae Park
ad782c48df Documentation/vm: move user guides to admin-guide/mm/
Most memory management user guide documents are in 'admin-guide/mm/',
but two of those are in 'vm/'.  This moves the two docs into
'admin-guide/mm' for easier documents finding.

Link: https://lkml.kernel.org/r/20210917123958.3819-2-sj@kernel.org
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:44 -07:00
Marco Elver
4f612ed3f7 kfence: default to dynamic branch instead of static keys mode
We have observed that on very large machines with newer CPUs, the static
key/branch switching delay is on the order of milliseconds.  This is due
to the required broadcast IPIs, which simply does not scale well to
hundreds of CPUs (cores).  If done too frequently, this can adversely
affect tail latencies of various workloads.

One workaround is to increase the sample interval to several seconds,
while decreasing sampled allocation coverage, but the problem still
exists and could still increase tail latencies.

As already noted in the Kconfig help text, there are trade-offs: at
lower sample intervals the dynamic branch results in better performance;
however, at very large sample intervals, the static keys mode can result
in better performance -- careful benchmarking is recommended.

Our initial benchmarking showed that with large enough sample intervals
and workloads stressing the allocator, the static keys mode was slightly
better.  Evaluating and observing the possible system-wide side-effects
of the static-key-switching induced broadcast IPIs, however, was a blind
spot (in particular on large machines with 100s of cores).

Therefore, a major downside of the static keys mode is, unfortunately,
that it is hard to predict performance on new system architectures and
topologies, but also making conclusions about performance of new
workloads based on a limited set of benchmarks.

Most distributions will simply select the defaults, while targeting a
large variety of different workloads and system architectures.  As such,
the better default is CONFIG_KFENCE_STATIC_KEYS=n, and re-enabling it is
only recommended after careful evaluation.

For reference, on x86-64 the condition in kfence_alloc() generates
exactly
2 instructions in the kmem_cache_alloc() fast-path:

 | ...
 | cmpl   $0x0,0x1a8021c(%rip)  # ffffffff82d560d0 <kfence_allocation_gate>
 | je     ffffffff812d6003      <kmem_cache_alloc+0x243>
 | ...

which, given kfence_allocation_gate is infrequently modified, should be
well predicted by most CPUs.

Link: https://lkml.kernel.org/r/20211019102524.2807208-2-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00
Marco Elver
5cc906b4b4 kfence: add note to documentation about skipping covered allocations
Add a note briefly mentioning the new policy about "skipping currently
covered allocations if pool close to full." Since this has a notable
impact on KFENCE's bug-detection ability on systems with large uptimes,
it is worth pointing out the feature.

Link: https://lkml.kernel.org/r/20210923104803.2620285-5-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Aleksandr Nogikh <nogikh@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Taras Madan <tarasmadan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00
Brian Geffon
755804d169 zram: introduce an aged idle interface
This change introduces an aged idle interface to the existing idle sysfs
file for zram.

When CONFIG_ZRAM_MEMORY_TRACKING is enabled the idle file now also
accepts an integer argument.  This integer is the age (in seconds) of
pages to mark as idle.  The idle file still supports 'all' as it always
has.  This new approach allows for much more control over which pages
get marked as idle.

[bgeffon@google.com: use IS_ENABLED and cleanup comment]
  Link: https://lkml.kernel.org/r/20210924161128.1508015-1-bgeffon@google.com
[bgeffon@google.com: Sergey's cleanup suggestions]
  Link: https://lkml.kernel.org/r/20210929143056.13067-1-bgeffon@google.com

Link: https://lkml.kernel.org/r/20210923130115.1344361-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:43 -07:00
David Hildenbrand
6b740c6c3a mm/memory_hotplug: remove HIGHMEM leftovers
We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not
HIGHMEM.  Let's remove any leftover code -- including the unused
"status_change_nid_high" field part of the memory notifier.

Link: https://lkml.kernel.org/r/20210929143600.49379-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alex Shi <alexs@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
David Hildenbrand
9e122cc1bd memory-hotplug.rst: document the "auto-movable" online policy
Commit e83a437faa ("mm/memory_hotplug: introduce "auto-movable" online
policy") introduced a new memory online policy to automatically select a
zone for memory blocks to be onlined.  It added a way to set the active
online policy and tunables for the auto-movable online policy.

Follow-up commits tweaked the "auto-movable" policy to also consider
memory device details when selecting zones for memory blocks to be
onlined.

Let's document the new toggles and how the two online policies we have
work.

[david@redhat.com: updates]
  Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com

Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
David Hildenbrand
a8db400f99 memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ path
We accidentially added a superfluous "s".

Link: https://lkml.kernel.org/r/20210930144117.23641-3-david@redhat.com
Fixes: ac3332c447 ("memory-hotplug.rst: complete admin-guide overhaul")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
David Hildenbrand
d83fe3c99d memory-hotplug.rst: fix two instances of "movablecore" that should be "movable_node"
Patch series "memory-hotplug.rst: document the "auto-movable" online
policy".

Now that the memory-hotplug.rst overhaul is upstream, proper
documentation for the "auto-movable" online policy, documenting all new
toggles and options.  Along, two fixes for the original overhaul.

This patch (of 3):

We really want to refer to the "movable_node" kernel command line
parameter here.

Link: https://lkml.kernel.org/r/20210930144117.23641-2-david@redhat.com
Fixes: ac3332c447 ("memory-hotplug.rst: complete admin-guide overhaul")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:42 -07:00
Zhenguo Yao
b5389086ad hugetlbfs: extend the definition of hugepages parameter to support node allocation
We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com
Signed-off-by: Zhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:41 -07:00
Zhenliang Wei
f7df2b1cf0 tools/vm/page_owner_sort.c: count and sort by mem
When viewing page owner information, we may be more concerned about the
total memory rather than the times of stack appears.  Therefore, the
following adjustments are made:

1. Added the statistics on the total number of pages.

2. Added the optional parameter "-m" to configure the program to sort by
   memory (total pages).

The general output of page_owner is as follows:

	Page allocated via order XXX, ...
	PFN XXX ...
	 // Detailed stack

	Page allocated via order XXX, ...
	PFN XXX ...
	 // Detailed stack

The original page_owner_sort ignores PFN rows, puts the remaining rows
in buf, counts the times of buf, and finally sorts them according to the
times.  General output:

	XXX times:
	Page allocated via order XXX, ...
	 // Detailed stack

Now, we use regexp to extract the page order value from the buf, and
count the total pages for the buf.  General output:

	XXX times, XXX pages:
	Page allocated via order XXX, ...
	 // Detailed stack

By default, it is still sorted by the times of buf; If you want to sort
by the pages nums of buf, use the new -m parameter.

Link: https://lkml.kernel.org/r/1631678242-41033-1-git-send-email-weizhenliang@huawei.com
Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:40 -07:00
Baolin Wang
38e719ab26 hugetlb: support node specified when using cma for gigantic hugepages
Now the size of CMA area for gigantic hugepages runtime allocation is
balanced for all online nodes, but we also want to specify the size of
CMA per-node, or only one node in some cases, which are similar with
patch [1].

For example, on some multi-nodes systems, each node's memory can be
different, allocating the same size of CMA for each node is not suitable
for the low-memory nodes.  Meanwhile some workloads like DPDK mentioned
by Zhenguo in patch [1] only need hugepages in one node.

On the other hand, we have some machines with multiple types of memory,
like DRAM and PMEM (persistent memory).  On this system, we may want to
specify all the hugepages only on DRAM node, or specify the proportion
of DRAM node and PMEM node, to tuning the performance of the workloads.

Thus this patch adds node format for 'hugetlb_cma' parameter to support
specifying the size of CMA per-node.  An example is as follows:

  hugetlb_cma=0:5G,2:5G

which means allocating 5G size of CMA area on node 0 and node 2
respectively.  And the users should use the node specific sysfs file to
allocate the gigantic hugepages if specified the CMA size on that node.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.com [1]
Link: https://lkml.kernel.org/r/bb790775ca60bb8f4b26956bb3f6988f74e075c7.1634261144.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:39 -07:00
Mike Kravetz
79dfc69552 hugetlb: add demote hugetlb page sysfs interfaces
Patch series "hugetlb: add demote/split page functionality", v4.

The concurrent use of multiple hugetlb page sizes on a single system is
becoming more common.  One of the reasons is better TLB support for
gigantic page sizes on x86 hardware.  In addition, hugetlb pages are
being used to back VMs in hosting environments.

When using hugetlb pages to back VMs, it is often desirable to
preallocate hugetlb pools.  This avoids the delay and uncertainty of
allocating hugetlb pages at VM startup.  In addition, preallocating huge
pages minimizes the issue of memory fragmentation that increases the
longer the system is up and running.

In such environments, a combination of larger and smaller hugetlb pages
are preallocated in anticipation of backing VMs of various sizes.  Over
time, the preallocated pool of smaller hugetlb pages may become depleted
while larger hugetlb pages still remain.  In such situations, it is
desirable to convert larger hugetlb pages to smaller hugetlb pages.

Converting larger to smaller hugetlb pages can be accomplished today by
first freeing the larger page to the buddy allocator and then allocating
the smaller pages.  For example, to convert 50 GB pages on x86:

  gb_pages=`cat .../hugepages-1048576kB/nr_hugepages`
  m2_pages=`cat .../hugepages-2048kB/nr_hugepages`
  echo $(($gb_pages - 50)) > .../hugepages-1048576kB/nr_hugepages
  echo $(($m2_pages + 25600)) > .../hugepages-2048kB/nr_hugepages

On an idle system this operation is fairly reliable and results are as
expected.  The number of 2MB pages is increased as expected and the time
of the operation is a second or two.

However, when there is activity on the system the following issues
arise:

1) This process can take quite some time, especially if allocation of
   the smaller pages is not immediate and requires migration/compaction.

2) There is no guarantee that the total size of smaller pages allocated
   will match the size of the larger page which was freed. This is
   because the area freed by the larger page could quickly be
   fragmented.

In a test environment with a load that continually fills the page cache
with clean pages, results such as the following can be observed:

  Unexpected number of 2MB pages allocated: Expected 25600, have 19944
  real    0m42.092s
  user    0m0.008s
  sys     0m41.467s

To address these issues, introduce the concept of hugetlb page demotion.
Demotion provides a means of 'in place' splitting of a hugetlb page to
pages of a smaller size.  This avoids freeing pages to buddy and then
trying to allocate from buddy.

Page demotion is controlled via sysfs files that reside in the per-hugetlb
page size and per node directories.

 - demote_size
        Target page size for demotion, a smaller huge page size. File
        can be written to chose a smaller huge page size if multiple are
        available.

 - demote
        Writable number of hugetlb pages to be demoted

To demote 50 GB huge pages, one would:

  cat .../hugepages-1048576kB/free_hugepages   /* optional, verify free pages */
  cat .../hugepages-1048576kB/demote_size      /* optional, verify target size */
  echo 50 > .../hugepages-1048576kB/demote

Only hugetlb pages which are free at the time of the request can be
demoted.  Demotion does not add to the complexity of surplus pages and
honors reserved huge pages.  Therefore, when a value is written to the
sysfs demote file, that value is only the maximum number of pages which
will be demoted.  It is possible fewer will actually be demoted.  The
recently introduced per-hstate mutex is used to synchronize demote
operations with other operations that modify hugetlb pools.

Real world use cases
--------------------
The above scenario describes a real world use case where hugetlb pages
are used to back VMs on x86.  Both issues of long allocation times and
not necessarily getting the expected number of smaller huge pages after
a free and allocate cycle have been experienced.  The occurrence of
these issues is dependent on other activity within the host and can not
be predicted.

This patch (of 5):

Two new sysfs files are added to demote hugtlb pages.  These files are
both per-hugetlb page size and per node.  Files are:

  demote_size - The size in Kb that pages are demoted to. (read-write)
  demote - The number of huge pages to demote. (write-only)

By default, demote_size is the next smallest huge page size.  Valid huge
page sizes less than huge page size may be written to this file.  When
huge pages are demoted, they are demoted to this size.

Writing a value to demote will result in an attempt to demote that
number of hugetlb pages to an appropriate number of demote_size pages.

NOTE: Demote interfaces are only provided for huge page sizes if there
is a smaller target demote huge page size.  For example, on x86 1GB huge
pages will have demote interfaces.  2MB huge pages will not have demote
interfaces.

This patch does not provide full demote functionality.  It only provides
the sysfs interfaces.

It also provides documentation for the new interfaces.

[mike.kravetz@oracle.com: n_mask initialization does not need to be protected by the mutex]
  Link: https://lkml.kernel.org/r/0530e4ef-2492-5186-f919-5db68edea654@oracle.com

Link: https://lkml.kernel.org/r/20211007181918.136982-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: David Rientjes <rientjes@google.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Nghia Le <nghialm78@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:39 -07:00
Tiberiu A Georgescu
cbbb69d3c4 Documentation: update pagemap with shmem exceptions
This patch follows the discussions on previous documentation patch
threads [1][2].  It presents the exception case of shared memory
management from the pagemap's point of view.  It briefly describes what
is missing, why it is missing and alternatives to the pagemap for page
info retrieval in user space.

In short, the kernel does not keep track of PTEs for swapped out shared
pages within the processes that references them.  Thus, the
proc/pid/pagemap tool cannot print the swap destination of the shared
memory pages, instead setting the pagemap entry to zero for both
non-allocated and swapped out pages.  This can create confusion for
users who need information on swapped out pages.

The reasons why maintaining the PTEs of all swapped out shared pages
among all processes while maintaining similar performance is not a
trivial task, or a desirable change, have been discussed extensively
[1][3][4][5].  There are also arguments for why this arguably missing
information should eventually be exposed to the user in either a future
pagemap patch, or by an alternative tool.

[1]: https://marc.info/?m=162878395426774
[2]: https://lore.kernel.org/lkml/20210920164931.175411-1-tiberiu.georgescu@nutanix.com/
[3]: https://lore.kernel.org/lkml/20210730160826.63785-1-tiberiu.georgescu@nutanix.com/
[4]: https://lore.kernel.org/lkml/20210807032521.7591-1-peterx@redhat.com/
[5]: https://lore.kernel.org/lkml/20210715201651.212134-1-peterx@redhat.com/

Mention the current missing information in the pagemap and alternatives
on how to retrieve it, in case someone stumbles upon unexpected
behaviour.

Link: https://lkml.kernel.org/r/20210923064618.157046-1-tiberiu.georgescu@nutanix.com
Link: https://lkml.kernel.org/r/20210923064618.157046-2-tiberiu.georgescu@nutanix.com
Signed-off-by: Tiberiu A Georgescu <tiberiu.georgescu@nutanix.com>
Reviewed-by: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Reviewed-by: Florian Schmidt <florian.schmidt@nutanix.com>
Reviewed-by: Carl Waldspurger <carl.waldspurger@nutanix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:36 -07:00
Shakeel Butt
58056f7750 memcg, kmem: further deprecate kmem.limit_in_bytes
The deprecation process of kmem.limit_in_bytes started with the commit
0158115f70 ("memcg, kmem: deprecate kmem.limit_in_bytes") which also
explains in detail the motivation behind the deprecation.  To summarize,
it is the unexpected behavior on hitting the kmem limit.  This patch
moves the deprecation process to the next stage by disallowing to set
the kmem limit.  In future we might just remove the kmem.limit_in_bytes
file completely.

[akpm@linux-foundation.org: s/ENOTSUPP/EOPNOTSUPP/]
[arnd@arndb.de: mark cancel_charge() inline]
  Link: https://lkml.kernel.org/r/20211022070542.679839-1-arnd@kernel.org

Link: https://lkml.kernel.org/r/20211019153408.2916808-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06 13:30:35 -07:00
Linus Torvalds
411a44c24a Merge tag 'net-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from WiFi (mac80211), and BPF.

  Current release - regressions:

   - skb_expand_head: adjust skb->truesize to fix socket memory
     accounting

   - mptcp: fix corrupt receiver key in MPC + data + checksum

  Previous releases - regressions:

   - multicast: calculate csum of looped-back and forwarded packets

   - cgroup: fix memory leak caused by missing cgroup_bpf_offline

   - cfg80211: fix management registrations locking, prevent list
     corruption

   - cfg80211: correct false positive in bridge/4addr mode check

   - tcp_bpf: fix race in the tcp_bpf_send_verdict resulting in reusing
     previous verdict

  Previous releases - always broken:

   - sctp: enhancements for the verification tag, prevent attackers from
     killing SCTP sessions

   - tipc: fix size validations for the MSG_CRYPTO type

   - mac80211: mesh: fix HE operation element length check, prevent out
     of bound access

   - tls: fix sign of socket errors, prevent positive error codes being
     reported from read()/write()

   - cfg80211: scan: extend RCU protection in
     cfg80211_add_nontrans_list()

   - implement ->sock_is_readable() for UDP and AF_UNIX, fix poll() for
     sockets in a BPF sockmap

   - bpf: fix potential race in tail call compatibility check resulting
     in two operations which would make the map incompatible succeeding

   - bpf: prevent increasing bpf_jit_limit above max

   - bpf: fix error usage of map_fd and fdget() in generic batch update

   - phy: ethtool: lock the phy for consistency of results

   - prevent infinite while loop in skb_tx_hash() when Tx races with
     driver reconfiguring the queue <> traffic class mapping

   - usbnet: fixes for bad HW conjured by syzbot

   - xen: stop tx queues during live migration, prevent UAF

   - net-sysfs: initialize uid and gid before calling
     net_ns_get_ownership

   - mlxsw: prevent Rx stalls under memory pressure"

* tag 'net-5.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
  Revert "net: hns3: fix pause config problem after autoneg disabled"
  mptcp: fix corrupt receiver key in MPC + data + checksum
  riscv, bpf: Fix potential NULL dereference
  octeontx2-af: Fix possible null pointer dereference.
  octeontx2-af: Display all enabled PF VF rsrc_alloc entries.
  octeontx2-af: Check whether ipolicers exists
  net: ethernet: microchip: lan743x: Fix skb allocation failure
  net/tls: Fix flipped sign in async_wait.err assignment
  net/tls: Fix flipped sign in tls_err_abort() calls
  net/smc: Correct spelling mistake to TCPF_SYN_RECV
  net/smc: Fix smc_link->llc_testlink_time overflow
  nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
  vmxnet3: do not stop tx queues after netif_device_detach()
  r8169: Add device 10ec:8162 to driver r8169
  ptp: Document the PTP_CLK_MAGIC ioctl number
  usbnet: fix error return code in usbnet_probe()
  net: hns3: adjust string spaces of some parameters of tx bd info in debugfs
  net: hns3: expand buffer len for some debugfs command
  net: hns3: add more string spaces for dumping packets number of queue info in debugfs
  net: hns3: fix data endian problem of some functions of debugfs
  ...
2021-10-28 10:17:31 -07:00
Randy Dunlap
f821615167 ptp: Document the PTP_CLK_MAGIC ioctl number
Add PTP_CLK_MAGIC to the userspace-api/ioctl/ioctl-number.rst
documentation file.

Fixes: d94ba80ebb ("ptp: Added a brand new class driver for ptp clocks.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20211024163831.10200-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 17:02:51 -07:00
Linus Torvalds
a51aec4109 Merge tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
 "Some late pin control fixes, the most generally annoying will probably
  be the AMD IRQ storm fix affecting the Microsoft surface.

  Summary:

   - Three fixes pertaining to Broadcom DT bindings. Some stuff didn't
     work out as inteded, we need to back out

   - A resume bug fix in the STM32 driver

   - Disable and mask the interrupts on probe in the AMD pinctrl driver,
     affecting Microsoft surface"

* tag 'pinctrl-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: amd: disable and mask interrupts on probe
  pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume()
  Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode"
  dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example
  Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
2021-10-25 09:47:18 -07:00
Linus Torvalds
6c2c712767 Merge tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, and can.

  We'll have one more fix for a socket accounting regression, it's still
  getting polished. Otherwise things look fine.

  Current release - regressions:

   - revert "vrf: reset skb conntrack connection on VRF rcv", there are
     valid uses for previous behavior

   - can: m_can: fix iomap_read_fifo() and iomap_write_fifo()

  Current release - new code bugs:

   - mlx5: e-switch, return correct error code on group creation failure

  Previous releases - regressions:

   - sctp: fix transport encap_port update in sctp_vtag_verify

   - stmmac: fix E2E delay mechanism (in PTP timestamping)

  Previous releases - always broken:

   - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr

   - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of
     init

   - netfilter: ipvs: make global sysctl read-only in non-init netns

   - tcp: md5: fix selection between vrf and non-vrf keys

   - ipv6: count rx stats on the orig netdev when forwarding

   - bridge: mcast: use multicast_membership_interval for IGMPv3

   - can:
      - j1939: fix UAF for rx_kref of j1939_priv abort sessions on
        receiving bad messages

      - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix
        return error on FC timeout on TX path

   - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited

   - hns3: schedule the polling again when allocation fails, prevent
     stalls

   - drivers: add missing of_node_put() when aborting
     for_each_available_child_of_node()

   - ptp: fix possible memory leak and UAF in ptp_clock_register()

   - e1000e: fix packet loss in burst mode on Tiger Lake and later

   - mlx5e: ipsec: fix more checksum offload issues"

* tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
  usbnet: sanity check for maxpacket
  net: enetc: make sure all traffic classes can send large frames
  net: enetc: fix ethtool counter name for PM0_TERR
  ptp: free 'vclock_index' in ptp_clock_release()
  sfc: Don't use netif_info before net_device setup
  sfc: Export fibre-specific supported link modes
  net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags
  net/mlx5e: IPsec: Fix a misuse of the software parser's fields
  net/mlx5e: Fix vlan data lost during suspend flow
  net/mlx5: E-switch, Return correct error code on group creation failure
  net/mlx5: Lag, change multipath and bonding to be mutually exclusive
  ice: Add missing E810 device ids
  igc: Update I226_K device ID
  e1000e: Fix packet loss on Tiger Lake and later
  e1000e: Separate TGP board type from SPT
  ptp: Fix possible memory leak in ptp_clock_register()
  net: stmmac: Fix E2E delay mechanism
  nfc: st95hf: Make spi remove() callback return zero
  net: hns3: disable sriov before unload hclge layer
  net: hns3: fix vf reset workqueue cannot exit
  ...
2021-10-21 15:36:50 -10:00
Jeremy Kerr
b416beb25d mctp: unify sockaddr_mctp types
Use the more precise __kernel_sa_family_t for smctp_family, to match
struct sockaddr.

Also, use an unsigned int for the network member; negative networks
don't make much sense. We're already using unsigned for mctp_dev and
mctp_skb_cb, but need to change mctp_sock to suit.

Fixes: 60fc639816 ("mctp: Add sockaddr_mctp to uapi")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18 13:47:09 +01:00
Linus Torvalds
3bb50f8530 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
 "Fixes up some issues in rc5"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-vdpa: Fix the wrong input in config_cb
  VDUSE: fix documentation underline warning
  Revert "virtio-blk: Add validation for block size in config space"
  vhost_vdpa: unset vq irq before freeing irq
  virtio: write back F_VERSION_1 before validate
2021-10-17 18:17:19 -10:00
Jakub Kicinski
2151135a1f Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-10-14

Brett ensures RDMA nodes are removed during release and rebuild. He also
corrects fw.mgmt.api to include the patch number for proper
identification.

Dave stops ida_free() being called when an IDA has not been allocated.

Michal corrects the order of parameters being provided and the number of
entries skipped for UDP tunnels.
====================

Link: https://lore.kernel.org/r/20211014181953.3538330-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-15 17:20:16 -07:00
Linus Torvalds
985f6ab93f Merge tag 'spi-fix-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
 "A few small fixes.

  Mostly driver specific but there's one in the core which fixes a
  deadlock when adding devices on spi-mux that's triggered because
  spi-mux is a SPI device which is itself a SPI controller and so can
  instantiate devices when registered.

  We were using a global lock to protect against reusing chip selects
  but they're a per controller thing so moving the lock per controller
  resolves that"

* tag 'spi-fix-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi-mux: Fix false-positive lockdep splats
  spi: Fix deadlock when adding SPI controllers on SPI buses
  spi: bcm-qspi: clear MSPI spifie interrupt during probe
  spi: spi-nxp-fspi: don't depend on a specific node name erratum workaround
  spi: mediatek: skip delays if they are 0
  spi: atmel: Fix PDC transfer setup bug
  spi: spidev: Add SPI ID table
  spi: Use 'flash' node name instead of 'spi-flash' in example
2021-10-15 10:21:46 -04:00
Linus Torvalds
86a44e9067 Merge tag 'ntfs3_for_5.15' of git://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:
 "Use the new api for mounting as requested by Christoph.

  Also fixed:

   - some memory leaks and panic

   - xfstests (tested on x86_64) generic/016 generic/021 generic/022
     generic/041 generic/274 generic/423

   - some typos, wrong returned error codes, dead code, etc"

* tag 'ntfs3_for_5.15' of git://github.com/Paragon-Software-Group/linux-ntfs3: (70 commits)
  fs/ntfs3: Check for NULL pointers in ni_try_remove_attr_list
  fs/ntfs3: Refactor ntfs_read_mft
  fs/ntfs3: Refactor ni_parse_reparse
  fs/ntfs3: Refactor ntfs_create_inode
  fs/ntfs3: Refactor ntfs_readlink_hlp
  fs/ntfs3: Rework ntfs_utf16_to_nls
  fs/ntfs3: Fix memory leak if fill_super failed
  fs/ntfs3: Keep prealloc for all types of files
  fs/ntfs3: Remove unnecessary functions
  fs/ntfs3: Forbid FALLOC_FL_PUNCH_HOLE for normal files
  fs/ntfs3: Refactoring of ntfs_set_ea
  fs/ntfs3: Remove locked argument in ntfs_set_ea
  fs/ntfs3: Use available posix_acl_release instead of ntfs_posix_acl_release
  fs/ntfs3: Check for NULL if ATTR_EA_INFO is incorrect
  fs/ntfs3: Refactoring of ntfs_init_from_boot
  fs/ntfs3: Reject mount if boot's cluster size < media sector size
  fs/ntfs3: Refactoring lock in ntfs_init_acl
  fs/ntfs3: Change posix_acl_equiv_mode to posix_acl_update_mode
  fs/ntfs3: Pass flags to ntfs_set_ea in ntfs_set_acl_ex
  fs/ntfs3: Refactor ntfs_get_acl_ex for better readability
  ...
2021-10-15 09:58:11 -04:00
Linus Torvalds
ec681c53f8 Merge tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Quite calm.

  The noisy DSA driver (embedded switches) changes, and adjustment to
  IPv6 IOAM behavior add to diffstat's bottom line but are not scary.

  Current release - regressions:

   - af_unix: rename UNIX-DGRAM to UNIX to maintain backwards
     compatibility

   - procfs: revert "add seq_puts() statement for dev_mcast", minor
     format change broke user space

  Current release - new code bugs:

   - dsa: fix bridge_num not getting cleared after ports leaving the
     bridge, resource leak

   - dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware
     bridges using VID 0, prevent packet drops if pvid is removed

   - dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware, prevent HW
     getting confused about station to VLAN mapping

  Previous releases - regressions:

   - virtio-net: fix for skb_over_panic inside big mode

   - phy: do not shutdown PHYs in READY state

   - dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's, fix link
     LED staying lit after ifdown

   - mptcp: fix possible infinite wait on recvmsg(MSG_WAITALL)

   - mqprio: Correct stats in mqprio_dump_class_stats()

   - ice: fix deadlock for Tx timestamp tracking flush

   - stmmac: fix feature detection on old hardware

  Previous releases - always broken:

   - sctp: account stream padding length for reconf chunk

   - icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe()

   - isdn: cpai: check ctr->cnr to avoid array index out of bound

   - isdn: mISDN: fix sleeping function called from invalid context

   - nfc: nci: fix potential UAF of rf_conn_info object

   - dsa: microchip: prevent ksz_mib_read_work from kicking back in
     after it's canceled in .remove and crashing

   - dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged
     ports

   - dsa: sja1105, ocelot: break circular dependency between switch and
     tag drivers

   - dsa: felix: improve timestamping in presence of packe loss

   - mlxsw: thermal: fix out-of-bounds memory accesses

  Misc:

   - ipv6: ioam: move the check for undefined bits to improve
     interoperability"

* tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
  icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe
  MAINTAINERS: Update the devicetree documentation path of imx fec driver
  sctp: account stream padding length for reconf chunk
  mlxsw: thermal: Fix out-of-bounds memory accesses
  ethernet: s2io: fix setting mac address during resume
  NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
  NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
  nfc: fix error handling of nfc_proto_register()
  Revert "net: procfs: add seq_puts() statement for dev_mcast"
  net: encx24j600: check error in devm_regmap_init_encx24j600
  net: korina: select CRC32
  net: arc: select CRC32
  net: dsa: felix: break at first CPU port during init and teardown
  net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
  net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
  net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
  net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
  net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
  net: mscc: ocelot: deny TX timestamping of non-PTP packets
  net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
  ...
2021-10-14 18:21:39 -04:00
Brett Creeley
b726ddf984 ice: Print the api_patch as part of the fw.mgmt.api
Currently when a user uses "devlink dev info", the fw.mgmt.api will be
the major.minor numbers as shown below:

devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
  driver ice
  serial_number 00-01-00-ff-ff-00-00-00
  versions:
      fixed:
        board.id K91258-000
      running:
        fw.mgmt 6.1.2
        fw.mgmt.api 1.7 <--- No patch number included
        fw.mgmt.build 0xd75e7d06
        fw.mgmt.srev 5
        fw.undi 1.2992.0
        fw.undi.srev 5
        fw.psid.api 3.10
        fw.bundle_id 0x800085cc
        fw.app.name ICE OS Default Package
        fw.app 1.3.27.0
        fw.app.bundle_id 0xc0000001
        fw.netlist 3.10.2000-3.1e.0
        fw.netlist.build 0x2a76e110
      stored:
        fw.mgmt.srev 5
        fw.undi 1.2992.0
        fw.undi.srev 5
        fw.psid.api 3.10
        fw.bundle_id 0x800085cc
        fw.netlist 3.10.2000-3.1e.0
        fw.netlist.build 0x2a76e110

There are many features in the driver that depend on the major, minor,
and patch version of the FW. Without the patch number in the output for
fw.mgmt.api debugging issues related to the FW API version is difficult.
Also, using major.minor.patch aligns with the existing firmware version
which uses a 3 digit value.

Fix this by making the fw.mgmt.api print the major.minor.patch
versions. Shown below is the result:

devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
  driver ice
  serial_number 00-01-00-ff-ff-00-00-00
  versions:
      fixed:
        board.id K91258-000
      running:
        fw.mgmt 6.1.2
        fw.mgmt.api 1.7.9 <--- patch number included
        fw.mgmt.build 0xd75e7d06
        fw.mgmt.srev 5
        fw.undi 1.2992.0
        fw.undi.srev 5
        fw.psid.api 3.10
        fw.bundle_id 0x800085cc
        fw.app.name ICE OS Default Package
        fw.app 1.3.27.0
        fw.app.bundle_id 0xc0000001
        fw.netlist 3.10.2000-3.1e.0
        fw.netlist.build 0x2a76e110
      stored:
        fw.mgmt.srev 5
        fw.undi 1.2992.0
        fw.undi.srev 5
        fw.psid.api 3.10
        fw.bundle_id 0x800085cc
        fw.netlist 3.10.2000-3.1e.0
        fw.netlist.build 0x2a76e110

Fixes: ff2e5c700e ("ice: add basic handler for devlink .info_get")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14 10:14:46 -07:00
Rafał Miłecki
1d0a779892 dt-bindings: pinctrl: brcm,ns-pinmux: drop unneeded CRU from example
There is no need to include CRU in example of this binding. It wasn't
complete / correct anyway. The proper binding can be find in the
mfd/brcm,cru.yaml .

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211008205938.29925-2-zajec5@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-14 01:09:07 +02:00
Rafał Miłecki
0398adaec3 Revert "dt-bindings: pinctrl: bcm4708-pinmux: rework binding to use syscon"
This reverts commit 2ae80900f2.

My rework was unneeded & wrong. It replaced a clear & correct "reg"
property usage with a custom "offset" one.

Back then I didn't understand how to properly handle CRU block binding.
I heard / read about syscon and tried to use it in a totally invalid
way. That change also missed Rob's review (obviously).

Northstar's pin controller is a simple consistent hardware block that
can be cleanly mapped using a 0x24 long reg space.

Since the rework commit there wasn't any follow up modifying in-kernel
DTS files to use the new binding. Broadcom also isn't known to use that
bugged binding. There is close to zero chance this revert may actually
cause problems / regressions.

This commit is a simple revert. Example binding may (should) be updated
/ cleaned up but that can be handled separately.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211008205938.29925-1-zajec5@gmail.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2021-10-14 01:09:06 +02:00
Randy Dunlap
09b6addf64 VDUSE: fix documentation underline warning
Fix a VDUSE documentation build warning:

Documentation/userspace-api/vduse.rst:21: WARNING: Title underline too short.

Fixes: 7bc7f61897 ("Documentation: Add documentation for VDUSE")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Xie Yongji <xieyongji@bytedance.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Link: https://lore.kernel.org/r/20211006202904.30241-1-rdunlap@infradead.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-10-13 08:42:07 -04:00
Linus Torvalds
459ea72c6c Merge branch 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "All documentation / comment updates"

* 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroupv2, docs: fix misinformation in "device controller" section
  cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
  docs/cgroup: remove some duplicate words
2021-10-11 17:16:41 -07:00
Linus Torvalds
3946b46cab Merge tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:

 - fix two minor issues in the Xen privcmd driver plus a cleanup patch
   for that driver

 - fix multiple issues related to running as PVH guest and some related
   earlyprintk fixes for other Xen guest types

 - fix an issue introduced in 5.15 the Xen balloon driver

* tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: fix cancelled balloon action
  xen/x86: adjust data placement
  x86/PVH: adjust function/data placement
  xen/x86: hook up xen_banner() also for PVH
  xen/x86: generalize preferred console model from PV to PVH Dom0
  xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
  xen/x86: allow "earlyprintk=xen" to work for PV Dom0
  xen/x86: make "earlyprintk=xen" work better for PVH Dom0
  xen/x86: allow PVH Dom0 without XEN_PV=y
  xen/x86: prevent PVH type from getting clobbered
  xen/privcmd: drop "pages" parameter from xen_remap_pfn()
  xen/privcmd: fix error handling in mmap-resource processing
  xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages
2021-10-08 12:55:23 -07:00
Linus Torvalds
0068dc8c96 Merge tag 'drm-fixes-2021-10-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "I've returned from my tropical island retreat, even managed to bring
  one of my kids on a dive with some turtles. Thanks to Daniel for doing
  last week's work.

  Otherwise this is the weekly fixes pull, it's a bit bigger because the
  vc4 reverts in your tree caused some problems with fixes in the
  drm-misc tree so it got left out last week, so this week has the misc
  fixes rebased without the vc4 pieces.

  Otherwise it's i915, amdgpu with the usual fixes and a scattering over
  other drivers.

  I expect things should calm down a bit more next week.

  core:
   - Kconfig fix for fb_simple vs simpledrm.

  i915:
   - Fix RKL HDMI audio
   - Fix runtime pm imbalance on i915_gem_shrink() error path
   - Fix Type-C port access before hw/sw state sync
   - Fix VBT backlight struct version/size check
   - Fix VT-d async flip on SKL/BXT with plane stretch workaround

  amdgpu:
   - DCN 3.1 DP alt mode fixes
   - S0ix gfxoff fix
   - Fix DRM_AMD_DC_SI dependencies
   - PCIe DPC handling fix
   - DCN 3.1 scaling fix
   - Documentation fix

  amdkfd:
   - Fix potential memory leak
   - IOMMUv2 init fixes

  vc4 (there were some hdmi fixes but things got reverted, sort it out
       later):
   - compiler fix

  nouveau:
   - Cursor fix
   - Fix ttm buffer moves for ampere gpu's by adding minimal
     acceleration support.
   - memory leak fixes

  rockchip:
   - crtc/clk fixup

  panel:
   - ili9341 Fix DT bindings indent
   - y030xx067a - yellow tint init seq fix

  gbefb:
   - Fix gbefb when built with COMPILE_TEST"

* tag 'drm-fixes-2021-10-08' of git://anongit.freedesktop.org/drm/drm: (33 commits)
  drm/amd/display: Fix detection of 4 lane for DPALT
  drm/amd/display: Limit display scaling to up to 4k for DCN 3.1
  drm/amd/display: Skip override for preferred link settings during link training
  drm/nouveau/debugfs: fix file release memory leak
  drm/nouveau/kms/nv50-: fix file release memory leak
  drm/nouveau: avoid a use-after-free when BO init fails
  DRM: delete DRM IRQ legacy midlayer docs
  video: fbdev: gbefb: Only instantiate device when built for IP32
  fbdev: simplefb: fix Kconfig dependencies
  drm/panel: abt-y030xx067a: yellow tint fix
  dt-bindings: panel: ili9341: correct indentation
  drm/nouveau/fifo/ga102: initialise chid on return from channel creation
  drm/rockchip: Update crtc fixup to account for fractional clk change
  drm/nouveau/ga102-: support ttm buffer moves via copy engine
  drm/nouveau/kms/tu102-: delay enabling cursor until after assign_windows
  drm/sun4i: dw-hdmi: Fix HDMI PHY clock setup
  drm/vc4: hdmi: Remove unused struct
  drm/kmb: Enable alpha blended second plane
  drm/amdgpu: handle the case of pci_channel_io_frozen only in amdgpu_pci_resume
  drm/amdgpu: init iommu after amdkfd device init
  ...
2021-10-08 09:58:50 -07:00
Herve Codina
3781b6ad2e dt-bindings: net: snps,dwmac: add dwmac 3.40a IP version
dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08 16:22:39 +01:00
Dave Airlie
bf79045e0e Merge tag 'amd-drm-fixes-5.15-2021-10-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.15-2021-10-06:

amdgpu:
- DCN 3.1 DP alt mode fixes
- S0ix gfxoff fix
- Fix DRM_AMD_DC_SI dependencies
- PCIe DPC handling fix
- DCN 3.1 scaling fix
- Documentation fix

amdkfd:
- Fix potential memory leak
- IOMMUv2 init fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211006203828.4818-1-alexander.deucher@amd.com
2021-10-08 11:40:21 +10:00
Linus Torvalds
4a16df549d Merge tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from xfrm, bpf, netfilter, and wireless.

  Current release - regressions:

   - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new
     value in the middle of an enum

   - unix: fix an issue in unix_shutdown causing the other end
     read/write failures

   - phy: mdio: fix memory leak

  Current release - new code bugs:

   - mlx5e: improve MQPRIO resiliency against bad configs

  Previous releases - regressions:

   - bpf: fix integer overflow leading to OOB access in map element
     pre-allocation

   - stmmac: dwmac-rk: fix ethernet on rk3399 based devices

   - netfilter: conntrack: fix boot failure with
     nf_conntrack.enable_hooks=1

   - brcmfmac: revert using ISO3166 country code and 0 rev as fallback

   - i40e: fix freeing of uninitialized misc IRQ vector

   - iavf: fix double unlock of crit_lock

  Previous releases - always broken:

   - bpf, arm: fix register clobbering in div/mod implementation

   - netfilter: nf_tables: correct issues in netlink rule change event
     notifications

   - dsa: tag_dsa: fix mask for trunked packets

   - usb: r8152: don't resubmit rx immediately to avoid soft lockup on
     device unplug

   - i40e: fix endless loop under rtnl if FW fails to correctly respond
     to capability query

   - mlx5e: fix rx checksum offload coexistence with ipsec offload

   - mlx5: force round second at 1PPS out start time and allow it only
     in supported clock modes

   - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
     sequence

  Misc:

   - xfrm: slightly rejig the new policy uAPI to make it less cryptic"

* tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  net: prefer socket bound to interface when not in VRF
  iavf: fix double unlock of crit_lock
  i40e: Fix freeing of uninitialized misc IRQ vector
  i40e: fix endless loop under rtnl
  dt-bindings: net: dsa: marvell: fix compatible in example
  ionic: move filter sync_needed bit set
  gve: report 64bit tx_bytes counter from gve_handle_report_stats()
  gve: fix gve_get_stats()
  rtnetlink: fix if_nlmsg_stats_size() under estimation
  gve: Properly handle errors in gve_assign_qpl
  gve: Avoid freeing NULL pointer
  gve: Correct available tx qpl check
  unix: Fix an issue in unix_shutdown causing the other end read/write failures
  net: stmmac: trigger PCS EEE to turn off on link down
  net: pcs: xpcs: fix incorrect steps on disable EEE
  netlink: annotate data races around nlk->bound
  net: pcs: xpcs: fix incorrect CL37 AN sequence
  net: sfp: Fix typo in state machine debug string
  net/sched: sch_taprio: properly cancel timer from taprio_destroy()
  net: bridge: fix under estimation in br_get_linkxstats_size()
  ...
2021-10-07 09:50:31 -07:00
Linus Torvalds
5af4055fa8 Merge tag 'devicetree-fixes-for-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:

 - Add another allowed address for TI sn65dsi86

 - Drop more redundant minItems/maxItems

 - Fix more graph 'unevaluatedProperties' warnings in media bindings

* tag 'devicetree-fixes-for-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: drm/bridge: ti-sn65dsi86: Fix reg value
  dt-bindings: Drop more redundant 'maxItems/minItems'
  dt-bindings: media: Fix more graph 'unevaluatedProperties' related warnings
2021-10-06 18:26:36 -07:00
Marcel Ziswiler
a50a059523 dt-bindings: net: dsa: marvell: fix compatible in example
While the MV88E6390 switch chip exists, one is supposed to use a
compatible of "marvell,mv88e6190" for it. Fix this in the given example.

Signed-off-by: Marcel Ziswiler <marcel@ziswiler.com>
Fixes: a3c53be55c ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:20:40 +01:00
Randy Dunlap
b67929808f DRM: delete DRM IRQ legacy midlayer docs
Remove documentation associated with the removal of the DRM IRQ legacy
midlayer.

Eliminates these documentation warnings:

../drivers/gpu/drm/drm_irq.c:1: warning: 'irq helpers' not found
../drivers/gpu/drm/drm_irq.c:1: warning: no structured comments found

Fixes: c1736b9008 ("drm: IRQ midlayer is now legacy")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: dri-devel@lists.freedesktop.org
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20211005025312.20913-1-rdunlap@infradead.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-10-06 11:12:29 +02:00
Krzysztof Kozlowski
990a9ff072 dt-bindings: panel: ili9341: correct indentation
Correct indentation warning:
  ilitek,ili9341.yaml:25:9: [warning] wrong indentation: expected 10 but found 8 (indentation)

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20210819101020.26368-1-krzysztof.kozlowski@canonical.com
(cherry picked from commit 333ba0d9d5)
Link: https://lore.kernel.org/dri-devel/CAL_JsqKcTfgnXNYzGDSFhKS2udhw2Dvk04ODwTxUdDRQjKdT0Q@mail.gmail.com/
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2021-10-06 11:12:28 +02:00
Alex Deucher
d08ce8c6d2 Documentation/gpu: remove spurious "+" in amdgpu.rst
Not sure why that was there.  Remove it.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-05 10:54:53 -04:00
Jan Beulich
42bc9716bc xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
xenboot_write_console() is dealing with these quite fine so I don't see
why xenboot_console_setup() would return -ENOENT in this case.

Adjust documentation accordingly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

Link: https://lore.kernel.org/r/3d212583-700e-8b2d-727a-845ef33ac265@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2021-10-05 08:36:05 +02:00
Geert Uytterhoeven
b2d70c0dbf dt-bindings: drm/bridge: ti-sn65dsi86: Fix reg value
make dtbs_check:

    arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dt.yaml: bridge@2c: reg:0:0: 45 was expected

According to the datasheet, the I2C address can be either 0x2c or 0x2d,
depending on the ADDR control input.

Fixes: e3896e6ddd ("dt-bindings: drm/bridge: Document sn65dsi86 bridge bindings")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Link: https://lore.kernel.org/r/08f73c2aa0d4e580303357dfae107d084d962835.1632486753.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
2021-10-04 12:01:59 -05:00