Commit Graph

797941 Commits

Author SHA1 Message Date
Quentin Perret
630246a06a sched/fair: Clean-up update_sg_lb_stats parameters
In preparation for the introduction of a new root domain flag which can
be set during load balance (the 'overutilized' flag), clean-up the set
of parameters passed to update_sg_lb_stats(). More specifically, the
'local_group' and 'local_idx' parameters can be removed since they can
easily be reconstructed from within the function.

While at it, transform the 'overload' parameter into a flag stored in
the 'sg_status' parameter hence facilitating the definition of new flags
when needed.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-12-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:17:01 +01:00
Quentin Perret
1f74de8798 sched/toplogy: Introduce the 'sched_energy_present' static key
In order to make sure Energy Aware Scheduling (EAS) will not impact
systems where no Energy Model is available, introduce a static key
guarding the access to EAS code. Since EAS is enabled on a
per-root-domain basis, the static key is enabled when at least one root
domain meets all conditions for EAS.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-10-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:17:01 +01:00
Quentin Perret
531b5c9f5c sched/topology: Make Energy Aware Scheduling depend on schedutil
Energy Aware Scheduling (EAS) is designed with the assumption that
frequencies of CPUs follow their utilization value. When using a CPUFreq
governor other than schedutil, the chances of this assumption being true
are small, if any. When schedutil is being used, EAS' predictions are at
least consistent with the frequency requests. Although those requests
have no guarantees to be honored by the hardware, they should at least
guide DVFS in the right direction and provide some hope in regards to the
EAS model being accurate.

To make sure EAS is only used in a sane configuration, create a strong
dependency on schedutil being used. Since having sugov compiled-in does
not provide that guarantee, make CPUFreq call a scheduler function on
governor changes hence letting it rebuild the scheduling domains, check
the governors of the online CPUs, and enable/disable EAS accordingly.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:17:00 +01:00
Quentin Perret
b68a4c0dba sched/topology: Disable EAS on inappropriate platforms
Energy Aware Scheduling (EAS) in its current form is most relevant on
platforms with asymmetric CPU topologies (e.g. Arm big.LITTLE) since
this is where there is a lot of potential for saving energy through
scheduling. This is particularly true since the Energy Model only
includes the active power costs of CPUs, hence not providing enough data
to compare packing-vs-spreading strategies.

As such, disable EAS on root domains where the SD_ASYM_CPUCAPACITY flag
is not set. While at it, disable EAS on systems where the complexity of
the Energy Model is too high since that could lead to unacceptable
scheduling overhead.

All in all, EAS can be used on a root domain if and only if:
  1. an Energy Model is available;
  2. the root domain has an asymmetric CPU capacity topology;
  3. the complexity of the root domain's EM is low enough to keep
     scheduling overheads low.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-8-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:17:00 +01:00
Quentin Perret
011b27bb5d sched/topology: Add lowest CPU asymmetry sched_domain level pointer
Add another member to the family of per-cpu sched_domain shortcut
pointers. This one, sd_asym_cpucapacity, points to the lowest level
at which the SD_ASYM_CPUCAPACITY flag is set. While at it, rename the
sd_asym shortcut to sd_asym_packing to avoid confusions.

Generally speaking, the largest opportunity to save energy via
scheduling comes from a smarter exploitation of heterogeneous platforms
(i.e. big.LITTLE). Consequently, the sd_asym_cpucapacity shortcut will
be used at first as the lowest domain where Energy-Aware Scheduling
(EAS) should be applied. For example, it is possible to apply EAS within
a socket on a multi-socket system, as long as each socket has an
asymmetric topology. Energy-aware cross-sockets wake-up balancing will
only happen when the system is over-utilized, or this_cpu and prev_cpu
are in different sockets.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-7-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:59 +01:00
Quentin Perret
6aa140fa45 sched/topology: Reference the Energy Model of CPUs when available
The existing scheduling domain hierarchy is defined to map to the cache
topology of the system. However, Energy Aware Scheduling (EAS) requires
more knowledge about the platform, and specifically needs to know about
the span of Performance Domains (PD), which do not always align with
caches.

To address this issue, use the Energy Model (EM) of the system to extend
the scheduler topology code with a representation of the PDs, alongside
the scheduling domains. More specifically, a linked list of PDs is
attached to each root domain. When multiple root domains are in use,
each list contains only the PDs covering the CPUs of its root domain. If
a PD spans over CPUs of multiple different root domains, it will be
duplicated in all lists.

The lists are fully maintained by the scheduler from
partition_sched_domains() in order to cope with hotplug and cpuset
changes. As for scheduling domains, the list are protected by RCU to
ensure safe concurrent updates.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-6-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:59 +01:00
Quentin Perret
27871f7a8a PM: Introduce an Energy Model management framework
Several subsystems in the kernel (task scheduler and/or thermal at the
time of writing) can benefit from knowing about the energy consumed by
CPUs. Yet, this information can come from different sources (DT or
firmware for example), in different formats, hence making it hard to
exploit without a standard API.

As an attempt to address this, introduce a centralized Energy Model
(EM) management framework which aggregates the power values provided
by drivers into a table for each performance domain in the system. The
power cost tables are made available to interested clients (e.g. task
scheduler or thermal) via platform-agnostic APIs. The overall design
is represented by the diagram below (focused on Arm-related drivers as
an example, but applicable to any architecture):

     +---------------+  +-----------------+  +-------------+
     | Thermal (IPA) |  | Scheduler (EAS) |  |    Other    |
     +---------------+  +-----------------+  +-------------+
             |                   | em_pd_energy()   |
             |                   | em_cpu_get()     |
             +-----------+       |         +--------+
                         |       |         |
                         v       v         v
                      +---------------------+
                      |                     |
                      |    Energy Model     |
                      |                     |
                      |     Framework       |
                      |                     |
                      +---------------------+
                         ^       ^       ^
                         |       |       | em_register_perf_domain()
              +----------+       |       +---------+
              |                  |                 |
      +---------------+  +---------------+  +--------------+
      |  cpufreq-dt   |  |   arm_scmi    |  |    Other     |
      +---------------+  +---------------+  +--------------+
              ^                  ^                 ^
              |                  |                 |
      +--------------+   +---------------+  +--------------+
      | Device Tree  |   |   Firmware    |  |      ?       |
      +--------------+   +---------------+  +--------------+

Drivers (typically, but not limited to, CPUFreq drivers) can register
data in the EM framework using the em_register_perf_domain() API. The
calling driver must provide a callback function with a standardized
signature that will be used by the EM framework to build the power
cost tables of the performance domain. This design should offer a lot of
flexibility to calling drivers which are free of reading information
from any location and to use any technique to compute power costs.
Moreover, the capacity states registered by drivers in the EM framework
are not required to match real performance states of the target. This
is particularly important on targets where the performance states are
not known by the OS.

The power cost coefficients managed by the EM framework are specified in
milli-watts. Although the two potential users of those coefficients (IPA
and EAS) only need relative correctness, IPA specifically needs to
compare the power of CPUs with the power of other components (GPUs, for
example), which are still expressed in absolute terms in their
respective subsystems. Hence, specifying the power of CPUs in
milli-watts should help transitioning IPA to using the EM framework
without introducing new problems by keeping units comparable across
sub-systems.
On the longer term, the EM of other devices than CPUs could also be
managed by the EM framework, which would enable to remove the absolute
unit. However, this is not absolutely required as a first step, so this
extension of the EM framework is left for later.

On the client side, the EM framework offers APIs to access the power
cost tables of a CPU (em_cpu_get()), and to estimate the energy
consumed by the CPUs of a performance domain (em_pd_energy()). Clients
such as the task scheduler can then use these APIs to access the shared
data structures holding the Energy Model of CPUs.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:58 +01:00
Quentin Perret
938e5e4b0d sched/cpufreq: Prepare schedutil for Energy Aware Scheduling
Schedutil requests frequency by aggregating utilization signals from
the scheduler (CFS, RT, DL, IRQ) and applying a 25% margin on top of
them. Since Energy Aware Scheduling (EAS) needs to be able to predict
the frequency requests, it needs to forecast the decisions made by the
governor.

In order to prepare the introduction of EAS, introduce
schedutil_freq_util() to centralize the aforementioned signal
aggregation and make it available to both schedutil and EAS. Since
frequency selection and energy estimation still need to deal with RT and
DL signals slightly differently, schedutil_freq_util() is called with a
different 'type' parameter in those two contexts, and returns an
aggregated utilization signal accordingly. While at it, introduce the
map_util_freq() function which is designed to make schedutil's 25%
margin usable easily for both sugov and EAS.

As EAS will be able to predict schedutil's frequency requests more
accurately than any other governor by design, it'd be sensible to make
sure EAS cannot be used without schedutil. This will be done later, once
EAS has actually been introduced.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-3-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:58 +01:00
Quentin Perret
5bd0988be1 sched/topology: Relocate arch_scale_cpu_capacity() to the internal header
By default, arch_scale_cpu_capacity() is only visible from within the
kernel/sched folder. Relocate it to include/linux/sched/topology.h to
make it visible to other clients needing to know about the capacity of
CPUs, such as the Energy Model framework.

This also shrinks the <linux/sched/topology.h> public header.

Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: adharmap@codeaurora.org
Cc: chris.redpath@arm.com
Cc: currojerez@riseup.net
Cc: dietmar.eggemann@arm.com
Cc: edubezval@gmail.com
Cc: gregkh@linuxfoundation.org
Cc: javi.merino@kernel.org
Cc: joel@joelfernandes.org
Cc: juri.lelli@redhat.com
Cc: morten.rasmussen@arm.com
Cc: patrick.bellasi@arm.com
Cc: pkondeti@codeaurora.org
Cc: rjw@rjwysocki.net
Cc: skannan@codeaurora.org
Cc: smuckle@google.com
Cc: srinivas.pandruvada@linux.intel.com
Cc: thara.gopinath@linaro.org
Cc: tkjos@google.com
Cc: valentin.schneider@arm.com
Cc: vincent.guittot@linaro.org
Cc: viresh.kumar@linaro.org
Link: https://lkml.kernel.org/r/20181203095628.11858-2-quentin.perret@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:58 +01:00
Yangtao Li
9ebc605381 sched/core: Remove unnecessary unlikely() in push_*_task()
WARN_ON() already contains an unlikely(), so it's not necessary to
use WARN_ON(1).

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181103172602.1917-1-tiny.windzz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:57 +01:00
Vincent Guittot
765d0af19f sched/topology: Remove the ::smt_gain field from 'struct sched_domain'
::smt_gain is used to compute the capacity of CPUs of a SMT core with the
constraint 1 < ::smt_gain < 2 in order to be able to compute number of CPUs
per core. The field has_free_capacity of struct numa_stat, which was the
last user of this computation of number of CPUs per core, has been removed
by:

  2d4056fafa ("sched/numa: Remove numa_has_capacity()")

We can now remove this constraint on core capacity and use the defautl value
SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE
becomes the maximum compute capacity of CPUs on every systems. This should
help to simplify some code and remove fields like rd->max_cpu_capacity

Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other
places in the code when it wants the capacity of a CPUs to scale
some metrics like in pelt, deadline or schedutil. In case on SMT, the value
returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE.

So remove it.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1535548752-4434-4-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-11 15:16:57 +01:00
Ingo Molnar
dfcb245e28 sched: Fix various typos in comments
Go over the scheduler source code and fix common typos
in comments - and a typo in an actual variable name.

No change in functionality intended.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 11:55:42 +01:00
Ingo Molnar
5f675231e4 Linux 4.20-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwEZdIeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGAlQH/19oax2Za3IPqF4X
 DM3lal5M6zlUVkoYstqzpbR3MqUwgEnMfvoeMDC6mI9N4/+r2LkV7cRR8HzqQCCS
 jDfD69IzRGb52VSeJmbOrkxBWsR1Nn0t4Z3rEeLPxwaOoNpRc8H973MbAQ2FKMpY
 S4Y3jIK1dNiRRxdh52NupVkQF+djAUwkBuVk/rrvRJmTDij4la03cuCDAO+Di9lt
 GHlVvygKw2SJhDR+z3ArwZNmE0ceCcE6+W7zPHzj2KeWuKrZg22kfUD454f2YEIw
 FG0hu9qecgtpYCkLSm2vr4jQzmpsDoyq3ZfwhjGrP4qtvPC3Db3vL3dbQnkzUcJu
 JtwhVCE=
 =O1q1
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc5' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 11:42:17 +01:00
Linus Torvalds
2595646791 Linux 4.20-rc5 2018-12-02 15:07:55 -08:00
Linus Torvalds
6a51272609 ARM: SoC fixes
Volume is a little higher than usual due to a set of gpio fixes for
 Davinci platforms that's been around a while, still seemed appropriate
 to not hold off until next merge window.
 
 Besides that it's the usual mix of minor fixes, mostly corrections of
 small stuff in device trees.
 
 Major stability-related one is the removal of a regulator from DT on
 Rock960, since DVFS caused undervoltage. I expect it'll be restored once
 they figure out the underlying issue.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAlwEMLEPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx3WewP/2shRXHQ8mSwMbqLApBUgPASpGtyJsLgP4vX
 ROMHdfQr2nhZPu9vy973aVAztkG3FCsWNhKqNVWTfvNf9eNYRh62D8/gqNYavQJH
 Gtq/TpiJWWDWoXzxHpOE5vSunNDUGWRrbigmgcONogNs42iX0ngLAy7GWzWHB7oc
 O3HAYxNevsBTJkkKpKGnDqDM1P4WaEG5OPdjMUN25UD7IBshzuVq4eG3LuqLLZ01
 NzGV/RErCnnLP8VSJlu+LQkLBeO5WpcvqZMeC6lNGBEBQAscTYRTucmM9tflJgCK
 B3+GczLFdJXKwluVV055MfrBxUweZ+Tm2gk7Ojtou/ozhFOdWICVT6KSwTHiOUIB
 ZDP/f56QfJCxxc/NFX5fJHSaYhXl+tj1HVxwG/dK/l3blMOX5I7cZkBKnjI9sDVl
 H3on9r5S3j1x1T534zf/n0OUwztIBmPiEZTPeoz6L1HuqpusmWJZB3knW6RnA4Lv
 3JQPowK2k97/3Xp4xnzl5rQreBomXv1hsszXmPKX0pIFXF1C+BQ0LwNd9cC/Hnq2
 dz02JkzoAoEg1L5DYhG63vg/3beg//3Z7uGNMu4LMcaNlLxl5AqMM7O18qJCfMth
 nFZRx+ZkZ7h8EJqXnMxnXgwHUzWN6Iq2AjKFfmVWRQcDZk+Ys9BlRV5O9m0N0JHb
 KfdtL0SC
 =m41T
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Volume is a little higher than usual due to a set of gpio fixes for
  Davinci platforms that's been around a while, still seemed appropriate
  to not hold off until next merge window.

  Besides that it's the usual mix of minor fixes, mostly corrections of
  small stuff in device trees.

  Major stability-related one is the removal of a regulator from DT on
  Rock960, since DVFS caused undervoltage. I expect it'll be restored
  once they figure out the underlying issue"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
  MAINTAINERS: Remove unused Qualcomm SoC mailing list
  ARM: davinci: dm644x: set the GPIO base to 0
  ARM: davinci: da830: set the GPIO base to 0
  ARM: davinci: dm355: set the GPIO base to 0
  ARM: davinci: dm646x: set the GPIO base to 0
  ARM: davinci: dm365: set the GPIO base to 0
  ARM: davinci: da850: set the GPIO base to 0
  gpio: davinci: restore a way to manually specify the GPIO base
  ARM: davinci: dm644x: define gpio interrupts as separate resources
  ARM: davinci: dm355: define gpio interrupts as separate resources
  ARM: davinci: dm646x: define gpio interrupts as separate resources
  ARM: davinci: dm365: define gpio interrupts as separate resources
  ARM: davinci: da8xx: define gpio interrupts as separate resources
  ARM: dts: at91: sama5d2: use the divided clock for SMC
  ARM: dts: imx51-zii-rdu1: Remove EEPROM node
  ARM: dts: rockchip: Remove @0 from the veyron memory node
  arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
  arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP
  arm64: dts: sdm845-mtp: Reserve reserved gpios
  arm64: dts: ti: k3-am654: Fix wakeup_uart reg address
  ...
2018-12-02 12:19:44 -08:00
Linus Torvalds
292974c5ac xen: fixes for 4.20-rc5
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXAOQWAAKCRCAXGG7T9hj
 vqqLAQCC+eJNcIEGXm2B5SpadB4v/TKPk+GYRK/zUP0FVPHo3AEAoSTZlmCUUSno
 7me8n0sTrd4+ZKMGHvFgAYbpoO90XQM=
 =BxP8
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - A revert of a previous commit as it is no longer necessary and has
   shown to cause problems in some memory hotplug cases.

 - Some small fixes and a minor cleanup.

 - A patch for adding better diagnostic data in a very rare failure
   case.

* tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  pvcalls-front: fixes incorrect error handling
  Revert "xen/balloon: Mark unallocated host memory as UNUSABLE"
  xen: xlate_mmu: add missing header to fix 'W=1' warning
  xen/x86: add diagnostic printout to xen_mc_flush() in case of error
  x86/xen: cleanup includes in arch/x86/xen/spinlock.c
2018-12-02 12:15:55 -08:00
Linus Torvalds
a234c7371f dmaengine-4.20-rc5
dmaengine fixes for v4.20-rc5
 
 This contains two fixes to at_hdmac which fixes long standing bus
 reported recently on serial transfers causing memory leak. These
 fixes were done by Richard Genoud.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcA5DaAAoJEHwUBw8lI4NHFtAP/AldipVXEyg6VbZGpLoIKcaP
 SdM8vMgeLJBLMdIpP6vxC/aZgHw2J9RFlKf+AljW2VAs3/fH8Nsfki7zNyA+yMzd
 lzyaGuzTIBC7JCz2ZY9OCPnLPeGO6YAFoZI8y2vQOYRsR6BpJd9iTQ0H0ncSFTxJ
 9S5cJ8nq4l9rRz26lKASfdPCPLtOWv03VTAM7NZNdb9Ai27SxRlBYGRQIamwoYsf
 xXmwRH+Kz8n1ajNmXtf75wX1X3yfR+/t+nJ7zOpcSb1+OQV2E+ZUiY+9u5BCr/Jp
 CHmzfy6G6f6YExc02FWvnfzsNODTyMnBce9CLJ5AsedZ7LsmcC8V8Y+K70w86w2A
 Tr1Jqt4Gj1C1vQ4ABkHkXf1ZIjMP0wCfoYN9PMBy7nfahfNXIW0JnqlHzRWkJRB1
 8amB4d63e4tdztLZxEVkNIm5ylIzbPSN3kSLYw7Pna4R8tk2wf4CtiZTn7CKlvjz
 m1hCkN80idh/Oaftpt9zJgGdkqaZ9hoUDL2Hq8ZzhGqWRD/u4ZLQctqH1Z1DxZjO
 3f6i2fsdjDpACfy5pd1Kj65zS/PdzWHx2Vws8H5BwFpBRgKdeK9yiYGPDFHZtVct
 hg6wmmb/sMaFgL9OAkmxvKInyFPMd0EQ+fhoVfHU8ON2rZXnsendy3e/MmD9xMer
 sosLDk9RuURxH9UBTFJ8
 =6BFf
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "This contains two fixes to at_hdmac which fixes long standing bus
  reported recently on serial transfers causing memory leak. These fixes
  were done by Richard Genoud"

* tag 'dmaengine-fix-4.20-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: at_hdmac: fix module unloading
  dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
2018-12-02 12:07:27 -08:00
Linus Torvalds
4b78317679 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull STIBP fallout fixes from Thomas Gleixner:
 "The performance destruction department finally got it's act together
  and came up with a cure for the STIPB regression:

   - Provide a command line option to control the spectre v2 user space
     mitigations. Default is either seccomp or prctl (if seccomp is
     disabled in Kconfig). prctl allows mitigation opt-in, seccomp
     enables the migitation for sandboxed processes.

   - Rework the code to handle the conditional STIBP/IBPB control and
     remove the now unused ptrace_may_access_sched() optimization
     attempt

   - Disable STIBP automatically when SMT is disabled

   - Optimize the switch_to() logic to avoid MSR writes and invocations
     of __switch_to_xtra().

   - Make the asynchronous speculation TIF updates synchronous to
     prevent stale mitigation state.

  As a general cleanup this also makes retpoline directly depend on
  compiler support and removes the 'minimal retpoline' option which just
  pretended to provide some form of security while providing none"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  x86/speculation: Provide IBPB always command line options
  x86/speculation: Add seccomp Spectre v2 user space protection mode
  x86/speculation: Enable prctl mode for spectre_v2_user
  x86/speculation: Add prctl() control for indirect branch speculation
  x86/speculation: Prepare arch_smt_update() for PRCTL mode
  x86/speculation: Prevent stale SPEC_CTRL msr content
  x86/speculation: Split out TIF update
  ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
  x86/speculation: Prepare for conditional IBPB in switch_mm()
  x86/speculation: Avoid __switch_to_xtra() calls
  x86/process: Consolidate and simplify switch_to_xtra() code
  x86/speculation: Prepare for per task indirect branch speculation control
  x86/speculation: Add command line control for indirect branch speculation
  x86/speculation: Unify conditional spectre v2 print functions
  x86/speculataion: Mark command line parser data __initdata
  x86/speculation: Mark string arrays const correctly
  x86/speculation: Reorder the spec_v2 code
  x86/l1tf: Show actual SMT state
  x86/speculation: Rework SMT state change
  sched/smt: Expose sched_smt_present static key
  ...
2018-12-01 12:35:48 -08:00
Linus Torvalds
880584176e for-linus-20181201
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlwC1c4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppxmD/4pqn8REEh/QUXWhCJbOXLLLxfQju7Uxs/v
 j2Bc6W/e7Z9jvKAs06IIhaV6SxBrM0oUebf/hJY0E/kTSHiNPJqx/X3W9hFYOo+p
 EJau3vavOrxVzgq5zt8S/i//HeanT+H37nE9WDqSRKXTta8JFDw+DoysepILTUvN
 WGDjuplPcurwmf2W1qES+5vNy/Jpln9ErNuqPBSjc6shozQ8WAzvuupVs+uZEpeK
 +gqrx0pJYrtoU+pSUK+Bt6bSzzp8Z0qHGIVMAabNULbz43qblK0ILRE+qLFbFwsB
 62EMMtX9b2Lsvqpoe2cQ+deQlUalsGVmpyE+7GP/evZbVmtD/NoH6cJQ/dA/tFtw
 cluL3rWBJKB5OZ1yatDE2/rUYsGo5FzqMUz/tIWSf2FdZcLfhRNLka7DueSA6NQe
 wtLJU9GrME67+t+PqncjDxoyQYma4oynAcc5dfqlBQv5OP7HDf4TP28g8FdkHjcy
 fEXAp58516YZiCpoWZf6dPR9fUQ0A1eF+qxHnUacy5tHN4AKPrccU3+k+0WStFNf
 qaOPkj4kWtv17d2DO4UoqAtBqFO16QCYSsa5+drpDeTOq9QgGqA6O+sGngN0LsxS
 F7x3msgBIkgEFYFtpuMBXnamdooiZMKrzI0Ctn7PK8b5Qx1OgRNCZcTQD4uql1Fj
 L6R/6Ynibg==
 =lMlT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:

 - Single range elevator discard merge fix, that caused crashes (Ming)

 - Fix for a regression in O_DIRECT, where we could potentially lose the
   error value (Maximilian Heyne)

 - NVMe pull request from Christoph, with little fixes all over the map
   for NVMe.

* tag 'for-linus-20181201' of git://git.kernel.dk/linux-block:
  block: fix single range discard merge
  nvme-rdma: fix double freeing of async event data
  nvme: flush namespace scanning work just before removing namespaces
  nvme: warn when finding multi-port subsystems without multipathing enabled
  fs: fix lost error code in dio_complete
  nvme-pci: fix surprise removal
  nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()
  nvme: Free ctrl device name on init failure
2018-12-01 11:36:32 -08:00
Linus Torvalds
c734b42583 pci-v4.20-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlwCH4wUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxwBA/+NpVj9yl5UGFDeHP04GHpikzft98e
 AQInZnNQOZb0hLzDalwKWuA/yFTUE93VSvqby7hOQojbcI3uRnjy+4gm78fakiEX
 V33tR1cNTe0j+Fk0BBirHaLWWOprhIeXJGoocaSXlGoRbqgBBtQ27eoPrSno/lRe
 QoEFSFdhUmZ6A+oInoBUmrGPgvsX+c2zSUWLDn6CMSHO619C4U0ZGA8wyZhilhCJ
 TxVSd1FWUEVNjTFfmyDi6h240iJKj+FVGOKIDNtIThYrr3V0bRZ/uKQmSSEuaF9m
 QqyKN0bezs5pUXb5WCmG0YywWvmTR2dqv9dOzpsEuoioLbws84Ip6kCFQr2z4AG/
 FDA5CMI1pAFbMOjJJxQ69CZnUY1XVqM2irEd3uZngbhJmxth+0JJNWwkrnGlxlTj
 5JiQSiPihActvlo+D0LulWgBmPzDZuNp9szXnQibVHlMCPBduTE+Nz+6+F7YNTIq
 KT6h2WidbMyduMV+l1Iw2KAsiFYywiHDOrm7dW/2UjeKF+MbsuPm/dcmsJgFZLd7
 ebhI+T3W8JAZTbBGqfvWAGOCpUgLJjYZoCKkaPd0iiHXgTZiVba9G24I/ECLBt+B
 3NZ6RGkMzp4c6gOoDLJSQsK6a4ABsrZ4LUYZIBkPgiFB1eeJG+uMT2zDpBJ01e0L
 dxg94iWjR5def2E=
 =whfV
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Fix a link speed checking interface that broke PCIe gen3 cards in
   gen1 slots (Mikulas Patocka)

 - Fix an imx6 link training error (Trent Piepho)

 - Fix a layerscape outbound window accessor calling error (Hou
   Zhiqiang)

 - Fix a DesignWare endpoint MSI-X address calculation error (Gustavo
   Pimentel)

* tag 'pci-v4.20-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Fix incorrect value returned from pcie_get_speed_cap()
  PCI: dwc: Fix MSI-X EP framework address calculation bug
  PCI: layerscape: Fix wrong invocation of outbound window disable accessor
  PCI: imx6: Fix link training status detection in link up check
2018-12-01 11:32:49 -08:00
Bjorn Helgaas
c74eadf881 Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus
- Fix DesignWare endpoint MSI-X address calculation bug (Gustavo
    Pimentel)

  - Fix Layerscape outbound window disable usage (Hou Zhiqiang)

  - Fix imx6 link up detection (Trent Piepho)

* lorenzo/pci/controller-fixes:
  PCI: dwc: Fix MSI-X EP framework address calculation bug
  PCI: layerscape: Fix wrong invocation of outbound window disable accessor
  PCI: imx6: Fix link training status detection in link up check
2018-11-30 23:42:08 -06:00
Mikulas Patocka
f1f90e254e PCI: Fix incorrect value returned from pcie_get_speed_cap()
The macros PCI_EXP_LNKCAP_SLS_*GB are values, not bit masks.  We must mask
the register and compare it against them.

This fixes errors like this:

  amdgpu: [powerplay] failed to send message 261 ret is 0

when a PCIe-v3 card is plugged into a PCIe-v1 slot, because the slot is
being incorrectly reported as PCIe-v3 capable.

6cf57be0f7, which appeared in v4.17, added pcie_get_speed_cap() with the
incorrect test of PCI_EXP_LNKCAP_SLS as a bitmask.  5d9a633040, which
appeared in v4.19, changed amdgpu to use pcie_get_speed_cap(), so the
amdgpu bug reports below are regressions in v4.19.

Fixes: 6cf57be0f7 ("PCI: Add pcie_get_speed_cap() to find max supported link speed")
Fixes: 5d9a633040 ("drm/amdgpu: use pcie functions for link width and speed")
Link: https://bugs.freedesktop.org/show_bug.cgi?id=108704
Link: https://bugs.freedesktop.org/show_bug.cgi?id=108778
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[bhelgaas: update comment, remove use of PCI_EXP_LNKCAP_SLS_8_0GB and
PCI_EXP_LNKCAP_SLS_16_0GB since those should be covered by PCI_EXP_LNKCAP2,
remove test of PCI_EXP_LNKCAP for zero, since that register is required]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org	# v4.17+
2018-11-30 23:42:03 -06:00
Linus Torvalds
d8f190ee83 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "31 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
  ocfs2: fix potential use after free
  mm/khugepaged: fix the xas_create_range() error path
  mm/khugepaged: collapse_shmem() do not crash on Compound
  mm/khugepaged: collapse_shmem() without freezing new_page
  mm/khugepaged: minor reorderings in collapse_shmem()
  mm/khugepaged: collapse_shmem() remember to clear holes
  mm/khugepaged: fix crashes due to misaccounted holes
  mm/khugepaged: collapse_shmem() stop if punched or truncated
  mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
  mm/huge_memory: splitting set mapping+index before unfreeze
  mm/huge_memory: rename freeze_page() to unmap_page()
  initramfs: clean old path before creating a hardlink
  kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
  psi: make disabling/enabling easier for vendor kernels
  proc: fixup map_files test on arm
  debugobjects: avoid recursive calls with kmemleak
  userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  userfaultfd: shmem: add i_size checks
  userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
  userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
  ...
2018-11-30 18:45:49 -08:00
Linus Torvalds
6c7954b7eb A few more MIPS fixes for 4.20:
- Fix mips_get_syscall_arg() to operate on the task specified when
    detecting o32 tasks running on MIPS64 kernels.
 
  - Fix some incorrect GPIO pin muxing for the MT7620 SoC.
 
  - Update the linux-mips mailing list address.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXAHFuBUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN0eOAEA9RshRqEGpFbTX8fm5N6/2SLlS/Rl
 CZ/79El7LEfesMUA/2Kh9ApEx5cd5/1DhqNBCGUhnLvJt7BaHT49H/7KySAM
 =tCjn
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.20_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull few more MIPS fixes from Paul Burton:

 - Fix mips_get_syscall_arg() to operate on the task specified when
   detecting o32 tasks running on MIPS64 kernels.

 - Fix some incorrect GPIO pin muxing for the MT7620 SoC.

 - Update the linux-mips mailing list address.

* tag 'mips_fixes_4.20_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MAINTAINERS: Update linux-mips mailing list address
  MIPS: ralink: Fix mt7620 nd_sd pinmux
  mips: fix mips_get_syscall_arg o32 check
2018-11-30 18:41:06 -08:00
Linus Torvalds
868dda00b9 - Cortex-A76 erratum workaround
- ftrace fix to enable syscall events on arm64
 
 - Fix uninitialised pointer in iort_get_platform_device_domain()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlwBhskACgkQa9axLQDI
 XvFS5g//UOW/9GvEXRCZF7Okm6FSYw/ADKnrr8Qv39JgKqp5AXG1Adg28QZzdiSD
 E+WKr07SyVj6lDc6gwGO4SzcOFNFO15DgdGY2i9v+cVQu5h/VmS3CiBlJG98WTFe
 Og0mDx3lnHLCUoYADt3YGzWDOXwco0OK2JGKs2Drk4ABoUEDt7dIsDfJtbIOGOpv
 Msx1KnQEuIV3dnZzr0+8PC89nbDG0A8+Mc7KScrESUmjNaO+c5hbcxxScsFswLCJ
 kaX6NttsqqilONt9JrQsDelYLrTP8A0UsYgTb2K36IyB5yCYhzZYMRVMw6wLhrKV
 VfnzjnN/xrJRnPoYW4yDTKLSLbnPuoF8k44XPR8AJA1AE+MLhT+C6yPZ3qcnFR7R
 LXtdDFBihe90HFYIBa1zt+E9jHoOTuWLkXJQTB0kdHjSXwwS0Ji7YuoyEolBQAUd
 QCkYdxSswnl5wGkXqI69V6lJ21lePtXZ8rnnl0lnNQNUyhzcuJFy9M7CcNKHHVcX
 pawnLlu3SJgZKrAR+d8SylSUVHqz3MV/8SuybC7WePl2d/0e4Qhry1y4RhrWuJZJ
 rxGNaBgql3sWmi4aHw65KaYna6YoXrsiwKwl0TK6ZgVzgR4Sk8AJkTk4WYF56ECc
 7E+szTmN3oFm+Bveua9ibryYlx9ayA9wh0UNIrjFCnZDNz9bl4s=
 =HpOx
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Cortex-A76 erratum workaround

 - ftrace fix to enable syscall events on arm64

 - Fix uninitialised pointer in iort_get_platform_device_domain()

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value
  arm64: ftrace: Fix to enable syscall events on arm64
  arm64: Add workaround for Cortex-A76 erratum 1286807
2018-11-30 18:39:07 -08:00
Linus Torvalds
1f817429b2 stackleak plugin fix
- Fix crash by not allowing kprobing of stackleak_erase() (Alexander Popov)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlwBcL4WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjEiD/9WErdSde1/jka9VX9SPdMFush9
 MCb1y0TpS+xlhF7Jyhxz9AJIRVwY0hXSO40RinYPJ4zgxXg+PjVJU/q2HhtGC4u1
 ZdUAblkw1WLEmYjKLuwyLcZmCAlodBpm+hMlYubAD2mfArtX4vLElcMd4livTz7x
 E4lJ/5WKVQzfadJ/onpltPxgQDVrov7u+uatrL7uFfb2geiCNV4lgeBqoJSx2QJd
 i8B7a28uifLS+ph5TJDx39nw7GJ4Iy4nnJofdX23kgKzfei5wef1mwdoaC6x93QL
 rrmhp3T4Mghnxi+txx+u8Bw7VxMiUnmlAoHQQcBQ0oLDMlTY+CQW3o32kvk+oHip
 5onLfQpHgn4kjd+Ns/qFFxr2oHYU0ODbEQ4taVXBu/f0jj0blQdEgLPWVwNpWS7l
 DoFezQ6qNjy1UJlbtg8t7dq1vUF2DTkaRj9JMPCqCBDX+zJrUZpOlCBoB+l4lC6e
 wJSh7m6QCNVh0fmIzctMenkio37rf2rEIp+l0/BMxFz0KsvlEBthb1x5dAlY6RK9
 Wq52oOLzduWTw3683lBnRYBFbsrDo9Y6LmY5wwkFFqx3TGZtcnomUtIqep8AE7YJ
 SH5QtrUUZKPLbPfHTr/k2G75daaJu8TIZ8cgmRmNe+o/LzH2u3IEnfYVteCxeeyj
 Bekcq9lWU5AVm4jNuQ==
 =OU+C
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull stackleak plugin fix from Kees Cook:
 "Fix crash by not allowing kprobing of stackleak_erase() (Alexander
  Popov)"

* tag 'gcc-plugins-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  stackleak: Disable function tracing and kprobes for stackleak_erase()
2018-11-30 18:36:30 -08:00
Linus Torvalds
fd3b3e0ec5 FS-Cache fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXAFfSPu3V2unywtrAQKPQRAAiHDs2d35Kc2qkTFLwGiP+wr+3+7Cyz7A
 hrWAvR7Oe7nBFVPmp6pwEnpBhf3mPsWlQpw3ZKZPo4fDQyRX+mDFC+2C7hkU1Q/J
 BkjTG4vYn1jiQGlL3SD1PfUxcWfwzoK4cz+V3hnFY5y0dsKiBZBR1Lw5G+UkaCnD
 4VaC3VAG56Vh14o5qSF3TWLZFyZ+JN6YA/M/DnwRPl8y4jnj1tJLs1DjdpEcWv6r
 15FKb2FRYaC7MRehpXd22JX6fv5ii2xazU3IfLucBrb4Vj+wAJrBY4wA3x/CFkAa
 as1VmxLkgoJEWa3M71tQOJBC8+QqkRb++PRUI3aadt2H4hXHfx3AmBuKkVroeS8o
 0BDhWGiTW4AqXUajkQcTc/mKV2x6h83V3DLyBRL1iC3+7qaBVhPNtxW+v6ln0Ce1
 FRG2I9LZp+RtWrVVyIPsa03V2V5OD7PTIBXK6TYtuqL+3uu7TNNc+UySvqDHWLL+
 Zo2ogpq//kZbjMdntNVhDEj12LW3zG05dtNuFEeJeuPM28yiXXtoWDmI49RAUQ4v
 RN6SwEXnKWehwG+YITYavV6gfHWlXdZ7grgCMHyViF/s9khBp7AGxbRzR0JXgXqL
 ko1Ojpbq2mdvjwGFQfde4MAqAxM3FPxdxGVLrgi+lgGTsEKv6IzrTo28teyAM81O
 D6cH0ldY90w=
 =6y+F
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache and cachefiles fixes from David Howells:
 "Misc fixes:

   - Fix an assertion failure at fs/cachefiles/xattr.c:138 caused by a
     race between a cache object lookup failing and someone attempting
     to reenable that object, thereby triggering an update of the
     object's attributes.

   - Fix an assertion failure at fs/fscache/operation.c:449 caused by a
     split atomic subtract and atomic read that allows a race to happen.

   - Fix a leak of backing pages when simultaneously reading the same
     page from the same object from two or more threads.

   - Fix a hang due to a race between a cache object being discarded and
     the corresponding cookie being reenabled.

  There are also some minor cleanups:

   - Cast an enum value to a different enum type to prevent clang from
     generating a warning. This shouldn't cause any sort of change in
     the emitted code.

   - Use ktime_get_real_seconds() instead of get_seconds(). This is just
     used to uniquify a filename for an object to be placed in the
     graveyard. Objects placed there are deleted by cachfilesd in
     userspace immediately thereafter.

   - Remove an initialised, but otherwise unused variable. This should
     have been entirely optimised away anyway"

* tag 'fscache-fixes-20181130' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache, cachefiles: remove redundant variable 'cache'
  cachefiles: avoid deprecated get_seconds()
  cachefiles: Explicitly cast enumerated type in put_object
  fscache: fix race between enablement and dropping of object
  cachefiles: Fix page leak in cachefiles_read_backing_file while vmscan is active
  fscache: Fix race in fscache_op_complete() due to split atomic_sub & read
  cachefiles: Fix an assertion failure when trying to update a failed object
2018-11-30 18:32:33 -08:00
Paul Burton
6584297b78
MAINTAINERS: Update linux-mips mailing list address
The linux-mips.org infrastructure has been unreliable recently & nobody
with sufficient access to fix it is around to do so. As a result we're
moving away from it, and part of this is migrating our mailing list to
kernel.org.

Replace all instances of linux-mips@linux-mips.org in MAINTAINERS with
the shiny new linux-mips@vger.kernel.org address.

The new list is now being archived on kernel.org at
https://lore.kernel.org/linux-mips/ which also holds the history of the
old linux-mips.org list.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@vger.kernel.org
Cc: linux-mips@linux-mips.org
2018-11-30 15:19:36 -08:00
Pan Bian
164f7e5867 ocfs2: fix potential use after free
ocfs2_get_dentry() calls iput(inode) to drop the reference count of
inode, and if the reference count hits 0, inode is freed.  However, in
this function, it then reads inode->i_generation, which may result in a
use after free bug.  Move the put operation later.

Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com
Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
95feeabb77 mm/khugepaged: fix the xas_create_range() error path
collapse_shmem()'s xas_nomem() is very unlikely to fail, but it is
rightly given a failure path, so move the whole xas_create_range() block
up before __SetPageLocked(new_page): so that it does not need to
remember to unlock_page(new_page).

Add the missing mem_cgroup_cancel_charge(), and set (currently unused)
result to SCAN_FAIL rather than SCAN_SUCCEED.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261531200.2275@eggly.anvils
Fixes: 77da9389b9 ("mm: Convert collapse_shmem to XArray")
Signed-off-by: Hugh Dickins <hughd@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
06a5e1268a mm/khugepaged: collapse_shmem() do not crash on Compound
collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before
it holds page lock of the first page, racing truncation then extension
might conceivably have inserted a hugepage there already.  Fail with the
SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or
otherwise mishandling the unexpected hugepage - though later we might
code up a more constructive way of handling it, with SCAN_SUCCESS.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
87c460a0bd mm/khugepaged: collapse_shmem() without freezing new_page
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0).  Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.

Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.

One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.

The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong.  So simply
eliminate the freezing of the new_page.

Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before.  They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
042a308248 mm/khugepaged: minor reorderings in collapse_shmem()
Several cleanups in collapse_shmem(): most of which probably do not
really matter, beyond doing things in a more familiar and reassuring
order.  Simplify the failure gotos in the main loop, and on success
update stats while interrupts still disabled from the last iteration.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
2af8ff2918 mm/khugepaged: collapse_shmem() remember to clear holes
Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp
flags khugepaged uses to allocate a huge page - in all common cases it
would just be a waste of effort - so collapse_shmem() must remember to
clear out any holes that it instantiates.

The obvious place to do so, where they are put into the page cache tree,
is not a good choice: because interrupts are disabled there.  Leave it
until further down, once success is assured, where the other pages are
copied (before setting PageUptodate).

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
aaa52e3400 mm/khugepaged: fix crashes due to misaccounted holes
Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent
hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by
clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later
closed and unlinked.

khugepaged's collapse_shmem() was forgetting to update mapping->nrpages
on the rollback path, after it had added but then needs to undo some
holes.

There is indeed an irritating asymmetry between shmem_charge(), whose
callers want it to increment nrpages after successfully accounting
blocks, and shmem_uncharge(), when __delete_from_page_cache() already
decremented nrpages itself: oh well, just add a comment on that to them
both.

And shmem_recalc_inode() is supposed to be called when the accounting is
expected to be in balance (so it can deduce from imbalance that reclaim
discarded some pages): so change shmem_charge() to update nrpages
earlier (though it's rare for the difference to matter at all).

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils
Fixes: 800d8c63b2 ("shmem: add huge pages support")
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
701270fa19 mm/khugepaged: collapse_shmem() stop if punched or truncated
Huge tmpfs testing showed that although collapse_shmem() recognizes a
concurrently truncated or hole-punched page correctly, its handling of
holes was liable to refill an emptied extent.  Add check to stop that.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils
Fixes: f3f0e1d215 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
006d3ff27e mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that
__split_huge_page() was using i_size_read() while holding the irq-safe
lru_lock and page tree lock, but the 32-bit i_size_read() uses an
irq-unsafe seqlock which should not be nested inside them.

Instead, read the i_size earlier in split_huge_page_to_list(), and pass
the end offset down to __split_huge_page(): all while holding head page
lock, which is enough to prevent truncation of that extent before the
page tree lock has been taken.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils
Fixes: baa355fd33 ("thp: file pages support for split_huge_page()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
173d9d9fd3 mm/huge_memory: splitting set mapping+index before unfreeze
Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s
VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).

Move the setting of mapping and index up before the page_ref_unfreeze()
in __split_huge_page_tail() to fix this: so that a page cache lookup
cannot get a reference while the tail's mapping and index are unstable.

In fact, might as well move them up before the smp_wmb(): I don't see an
actual need for that, but if I'm missing something, this way round is
safer than the other, and no less efficient.

You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is
misplaced, and should be left until after the trylock_page(); but left as
is has not crashed since, and gives more stringent assurance.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils
Fixes: e9b61f1985 ("thp: reintroduce split_huge_page()")
Requires: 605ca5ede7 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:15 -08:00
Hugh Dickins
906f9cdfc2 mm/huge_memory: rename freeze_page() to unmap_page()
The term "freeze" is used in several ways in the kernel, and in mm it
has the particular meaning of forcing page refcount temporarily to 0.
freeze_page() is just too confusing a name for a function that unmaps a
page: rename it unmap_page(), and rename unfreeze_page() remap_page().

Went to change the mention of freeze_page() added later in mm/rmap.c,
but found it to be incorrect: ordinary page reclaim reaches there too;
but the substance of the comment still seems correct, so edit it down.

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils
Fixes: e9b61f1985 ("thp: reintroduce split_huge_page()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Li Zhijian
7c0950d455 initramfs: clean old path before creating a hardlink
sys_link() can fail due to the new path already existing.  This case
ofen occurs when we use a concated initrd, for example:

1) prepare a basic rootfs, it contains a regular files rc.local
lizhijian@:~/yocto-tiny-i386-2016-04-22$ cat etc/rc.local
 #!/bin/sh
 echo "Running /etc/rc.local..."
yocto-tiny-i386-2016-04-22$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rootfs.cgz

2) create a extra initrd which also includes a etc/rc.local
lizhijian@:~/lkp-x86_64/etc$ echo "append initrd" >rc.local
lizhijian@:~/lkp/lkp-x86_64/etc$ cat rc.local
append initrd
lizhijian@:~/lkp/lkp-x86_64/etc$ ln rc.local rc.local.hardlink
append initrd
lizhijian@:~/lkp/lkp-x86_64/etc$ stat rc.local rc.local.hardlink
  File: 'rc.local'
  Size: 14        	Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d	Inode: 11296086    Links: 2
Access: (0664/-rw-rw-r--)  Uid: ( 1002/lizhijian)   Gid: ( 1002/lizhijian)
Access: 2018-11-15 16:08:28.654464815 +0800
Modify: 2018-11-15 16:07:57.514903210 +0800
Change: 2018-11-15 16:08:24.180228872 +0800
 Birth: -
  File: 'rc.local.hardlink'
  Size: 14        	Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d	Inode: 11296086    Links: 2
Access: (0664/-rw-rw-r--)  Uid: ( 1002/lizhijian)   Gid: ( 1002/lizhijian)
Access: 2018-11-15 16:08:28.654464815 +0800
Modify: 2018-11-15 16:07:57.514903210 +0800
Change: 2018-11-15 16:08:24.180228872 +0800
 Birth: -

lizhijian@:~/lkp/lkp-x86_64$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rc-local.cgz
lizhijian@:~/lkp/lkp-x86_64$ gzip -dc ../rc-local.cgz | cpio -t
.
etc
etc/rc.local.hardlink <<< it will be extracted first at this initrd
etc/rc.local

3) concate 2 initrds and boot
lizhijian@:~/lkp$ cat rootfs.cgz rc-local.cgz >concate-initrd.cgz
lizhijian@:~/lkp$ qemu-system-x86_64 -nographic -enable-kvm -cpu host -smp 1 -m 1024 -kernel ~/lkp/linux/arch/x86/boot/bzImage -append "console=ttyS0 earlyprint=ttyS0 ignore_loglevel" -initrd ./concate-initr.cgz -serial stdio -nodefaults

In this case, sys_link(2) will fail and return -EEXIST, so we can only get
the rc.local at rootfs.cgz instead of rc-local.cgz

[akpm@linux-foundation.org: move code to avoid forward declaration]
Link: http://lkml.kernel.org/r/1542352368-13299-1-git-send-email-lizhijian@cn.fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Philip Li <philip.li@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Li Zhijian <zhijianx.li@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Anders Roxell
903e8ff867 kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
Since __sanitizer_cov_trace_pc() is marked as notrace, function calls in
__sanitizer_cov_trace_pc() shouldn't be traced either.
ftrace_graph_caller() gets called for each function that isn't marked
'notrace', like canonicalize_ip().  This is the call trace from a run:

[  139.644550]  ftrace_graph_caller+0x1c/0x24
[  139.648352]  canonicalize_ip+0x18/0x28
[  139.652313]  __sanitizer_cov_trace_pc+0x14/0x58
[  139.656184]  sched_clock+0x34/0x1e8
[  139.659759]  trace_clock_local+0x40/0x88
[  139.663722]  ftrace_push_return_trace+0x8c/0x1f0
[  139.667767]  prepare_ftrace_return+0xa8/0x100
[  139.671709]  ftrace_graph_caller+0x1c/0x24

Rework so that check_kcov_mode() and canonicalize_ip() that are called
from __sanitizer_cov_trace_pc() are also marked as notrace.

Link: http://lkml.kernel.org/r/20181128081239.18317-1-anders.roxell@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signen-off-by: Anders Roxell <anders.roxell@linaro.org>
Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Johannes Weiner
e0c274472d psi: make disabling/enabling easier for vendor kernels
Mel Gorman reports a hackbench regression with psi that would prohibit
shipping the suse kernel with it default-enabled, but he'd still like
users to be able to opt in at little to no cost to others.

With the current combination of CONFIG_PSI and the psi_disabled bool set
from the commandline, this is a challenge.  Do the following things to
make it easier:

1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
   to enable CONFIG_PSI in their kernel but leave the feature disabled
   unless a user requests it at boot-time.

   To avoid double negatives, rename psi_disabled= to psi=.

2. Make psi_disabled a static branch to eliminate any branch costs
   when the feature is disabled.

In terms of numbers before and after this patch, Mel says:

: The following is a comparision using CONFIG_PSI=n as a baseline against
: your patch and a vanilla kernel
:
:                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
:                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
: Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
: Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
: Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
: Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
: Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
: Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
: Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
: Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
: Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
:
: It's showing that the vanilla kernel takes a hit (as the bisection
: indicated it would) and that disabling PSI by default is reasonably
: close in terms of performance for this particular workload on this
: particular machine so;

Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Alexey Dobriyan
dbd4af5474 proc: fixup map_files test on arm
https://bugs.linaro.org/show_bug.cgi?id=3782

Turns out arm doesn't permit mapping address 0, so try minimum virtual
address instead.

Link: http://lkml.kernel.org/r/20181113165446.GA28157@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Rafael David Tinoco <rafael.tinoco@linaro.org>
Tested-by: Rafael David Tinoco <rafael.tinoco@linaro.org>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Qian Cai
8de456cf87 debugobjects: avoid recursive calls with kmemleak
CONFIG_DEBUG_OBJECTS_RCU_HEAD does not play well with kmemleak due to
recursive calls.

fill_pool
  kmemleak_ignore
    make_black_object
      put_object
        __call_rcu (kernel/rcu/tree.c)
          debug_rcu_head_queue
            debug_object_activate
              debug_object_init
                fill_pool
                  kmemleak_ignore
                    make_black_object
                      ...

So add SLAB_NOLEAKTRACE to kmem_cache_create() to not register newly
allocated debug objects at all.

Link: http://lkml.kernel.org/r/20181126165343.2339-1-cai@gmx.us
Signed-off-by: Qian Cai <cai@gmx.us>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Andrea Arcangeli
dcf7fe9d89 userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
Set the page dirty if VM_WRITE is not set because in such case the pte
won't be marked dirty and the page would be reclaimed without writepage
(i.e.  swapout in the shmem case).

This was found by source review.  Most apps (certainly including QEMU)
only use UFFDIO_COPY on PROT_READ|PROT_WRITE mappings or the app can't
modify the memory in the first place.  This is for correctness and it
could help the non cooperative use case to avoid unexpected data loss.

Link: http://lkml.kernel.org/r/20181126173452.26955-6-aarcange@redhat.com
Reviewed-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Andrea Arcangeli
e2a50c1f64 userfaultfd: shmem: add i_size checks
With MAP_SHARED: recheck the i_size after taking the PT lock, to
serialize against truncate with the PT lock.  Delete the page from the
pagecache if the i_size_read check fails.

With MAP_PRIVATE: check the i_size after the PT lock before mapping
anonymous memory or zeropages into the MAP_PRIVATE shmem mapping.

A mostly irrelevant cleanup: like we do the delete_from_page_cache()
pagecache removal after dropping the PT lock, the PT lock is a spinlock
so drop it before the sleepable page lock.

Link: http://lkml.kernel.org/r/20181126173452.26955-5-aarcange@redhat.com
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Andrea Arcangeli
29ec90660d userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
After the VMA to register the uffd onto is found, check that it has
VM_MAYWRITE set before allowing registration.  This way we inherit all
common code checks before allowing to fill file holes in shmem and
hugetlbfs with UFFDIO_COPY.

The userfaultfd memory model is not applicable for readonly files unless
it's a MAP_PRIVATE.

Link: http://lkml.kernel.org/r/20181126173452.26955-4-aarcange@redhat.com
Fixes: ff62a34210 ("hugetlb: implement memfd sealing")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Reported-by: Jann Horn <jannh@google.com>
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Andrea Arcangeli
5b51072e97 userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
Userfaultfd did not create private memory when UFFDIO_COPY was invoked
on a MAP_PRIVATE shmem mapping.  Instead it wrote to the shmem file,
even when that had not been opened for writing.  Though, fortunately,
that could only happen where there was a hole in the file.

Fix the shmem-backed implementation of UFFDIO_COPY to create private
memory for MAP_PRIVATE mappings.  The hugetlbfs-backed implementation
was already correct.

This change is visible to userland, if userfaultfd has been used in
unintended ways: so it introduces a small risk of incompatibility, but
is necessary in order to respect file permissions.

An app that uses UFFDIO_COPY for anything like postcopy live migration
won't notice the difference, and in fact it'll run faster because there
will be no copy-on-write and memory waste in the tmpfs pagecache
anymore.

Userfaults on MAP_PRIVATE shmem keep triggering only on file holes like
before.

The real zeropage can also be built on a MAP_PRIVATE shmem mapping
through UFFDIO_ZEROPAGE and that's safe because the zeropage pte is
never dirty, in turn even an mprotect upgrading the vma permission from
PROT_READ to PROT_READ|PROT_WRITE won't make the zeropage pte writable.

Link: http://lkml.kernel.org/r/20181126173452.26955-3-aarcange@redhat.com
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Andrea Arcangeli
9e368259ad userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
Patch series "userfaultfd shmem updates".

Jann found two bugs in the userfaultfd shmem MAP_SHARED backend: the
lack of the VM_MAYWRITE check and the lack of i_size checks.

Then looking into the above we also fixed the MAP_PRIVATE case.

Hugh by source review also found a data loss source if UFFDIO_COPY is
used on shmem MAP_SHARED PROT_READ mappings (the production usages
incidentally run with PROT_READ|PROT_WRITE, so the data loss couldn't
happen in those production usages like with QEMU).

The whole patchset is marked for stable.

We verified QEMU postcopy live migration with guest running on shmem
MAP_PRIVATE run as well as before after the fix of shmem MAP_PRIVATE.
Regardless if it's shmem or hugetlbfs or MAP_PRIVATE or MAP_SHARED, QEMU
unconditionally invokes a punch hole if the guest mapping is filebacked
and a MADV_DONTNEED too (needed to get rid of the MAP_PRIVATE COWs and
for the anon backend).

This patch (of 5):

We internally used EFAULT to communicate with the caller, switch to
ENOENT, so EFAULT can be used as a non internal retval.

Link: http://lkml.kernel.org/r/20181126173452.26955-2-aarcange@redhat.com
Fixes: 4c27fe4c4c ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Hugh Dickins <hughd@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00
Luis Chamberlain
5618cf031f lib/test_kmod.c: fix rmmod double free
We free the misc device string twice on rmmod; fix this.  Without this
we cannot remove the module without crashing.

Link: http://lkml.kernel.org/r/20181124050500.5257-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[4.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30 14:56:14 -08:00