Commit Graph

966273 Commits

Author SHA1 Message Date
Christoph Hellwig
250eec9e39 Documentation/hdio: fix up obscure bd_contains references
bd_contains is an implementation detail and should not be mentioned in
a userspace API documentation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 08:18:57 -06:00
Jeffle Xu
3aab91774b block: remove unused BLK_QC_T_EAGAIN flag
commit 7b6620d7db ("block: remove REQ_NOWAIT_INLINE") removed the
REQ_NOWAIT_INLINE related code, but the diff wasn't applied to
blk_types.h somehow.

Then commit 2771cefeac ("block: remove the REQ_NOWAIT_INLINE flag")
removed the REQ_NOWAIT_INLINE flag while the BLK_QC_T_EAGAIN flag still
remains.

Fixes: 7b6620d7db ("block: remove REQ_NOWAIT_INLINE")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 07:54:50 -06:00
Jens Axboe
163090c14a Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.10/drivers
Pull MD updates from Song.

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md/raid10: improve discard request for far layout
  md/raid10: improve raid10 discard request
  md/raid10: pull codes that wait for blocked dev into one function
  md/raid10: extend r10bio devs to raid disks
  md: add md_submit_discard_bio() for submitting discard bio
  md: Simplify code with existing definition RESYNC_SECTORS in raid10.c
  md/raid5: reallocate page array after setting new stripe_size
  md/raid5: resize stripe_head when reshape array
  md/raid5: let multiple devices of stripe_head share page
  md/raid6: let async recovery function support different page offset
  md/raid6: let syndrome computor support different page offset
  md/raid5: convert to new xor compution interface
  md/raid5: add new xor function to support different page offset
  md/raid5: make async_copy_data() to support different page offset
  md/raid5: add a new member of offset into r5dev
  md: only calculate blocksize once and use i_blocksize()
2020-09-25 07:48:20 -06:00
Jens Axboe
f3cd485050 io_uring: ensure open/openat2 name is cleaned on cancelation
If we cancel these requests, we'll leak the memory associated with the
filename. Add them to the table of ops that need cleaning, if
REQ_F_NEED_CLEANUP is set.

Cc: stable@vger.kernel.org
Fixes: e62753e4e2 ("io_uring: call statx directly")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 07:41:46 -06:00
Jon Hunter
61f764c307 eeprom: at24: Support custom device names for AT24 EEPROMs
By using the label property, a more descriptive name can be populated
for AT24 EEPROMs NVMEM device. Update the AT24 driver to check to see
if the label property is present and if so, use this as the name for
NVMEM device. Please note that when the 'label' property is present for
the AT24 EEPROM, we do not want the NVMEM driver to append the 'devid'
to the name and so the nvmem_config.id is initialised to
NVMEM_DEVID_NONE.

Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2020-09-25 15:33:04 +02:00
Sean Christopherson
8d214c4816 KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled.
Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are
toggled inadvertantly skipped the MMU context reset due to the mask
of bits that triggers PDPTR loads also being used to trigger MMU context
resets.

Fixes: 427890aff8 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode")
Fixes: cb957adb4e ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode")
Cc: Jim Mattson <jmattson@google.com>
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25 08:56:35 -04:00
Srinivas Kandagatla
709ec3f7fc slimbus: qcom-ngd-ctrl: disable ngd in qmi server down callback
In QMI new server notification we enable the NGD however during
delete server notification we do not disable the NGD.

This can lead to multiple instances of NGD being enabled, so make
sure that we disable NGD in delete server callback to fix this issue!

Fixes: 917809e228 ("slimbus: ngd: Add qcom SLIMBus NGD driver")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200925095520.27316-4-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25 14:41:51 +02:00
Srinivas Kandagatla
df2c471c4a slimbus: core: do not enter to clock pause mode in core
Let the controller logic decide when to enter into clock pause mode!
Entering in to pause mode during unregistration does not really make
sense as the controller is totally going down at that point in time.

Fixes: 4b14e62ad3 ("slimbus: Add support for 'clock-pause' feature")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200925095520.27316-3-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25 14:41:50 +02:00
Srinivas Kandagatla
f97769fde6 slimbus: core: check get_addr before removing laddr ida
logical address can be either assigned by the SLIMBus controller or the core.
Core uses IDA in cases where get_addr callback is not provided by the
controller.
Core already has this check while allocating IDR, however during absence
reporting this is not checked. This patch fixes this issue.

Fixes: 46a2bb5a7f ("slimbus: core: Add slim controllers support")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20200925095520.27316-2-srinivas.kandagatla@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25 14:41:50 +02:00
Greg Kroah-Hartman
5a487cf7ef This tag contains the following changes for kernel 5.10-rc1:
- release the kernel context object after we reset the device. This is
   needed to prevent a race where the firmware still has some in-flight
   transcations that require the kernel context (and its memory mappings) to
   be alive.
 
 - replace constant numbers with defines in QMAN initialization of GAUDI
 
 - correct an error message text and add a few debug messages to help debug
   issues that happen during context open and close.
 -----BEGIN PGP SIGNATURE-----
 
 iQFKBAABCgA0FiEE7TEboABC71LctBLFZR1NuKta54AFAl9t2h4WHG9kZWQuZ2Fi
 YmF5QGdtYWlsLmNvbQAKCRBlHU24q1rngKJTCAClbA79YeWImufwUr2kcs3OhPmI
 QeBCpiAMM/1chKTTLjh5RiAGWYTjEpwRn3DXunnrwdfeqvSoYPw7GxgP/oWKbY9M
 ISdGS5igw7VaFhG6PF2IpmrRt3R8qV+QOhsto54zt9ZkUwwHw2vzkFYrFYUWa1Rq
 g6NBYO9O2O07EK/wfVln9mDNMjsHSaY1hltOfY6vmU00psWGGGDYUg0QpOUJeZBw
 1+Aw5jQPmflfTdcOzEO2Dsts4mVXiLVQxgvaohTXVTbhIXvUZA+aKMj8Xd3snghT
 /SehLKHi0LWUvzpsSNAjk4nw+kAcKatP8aKjpFfKcGoIc805E6fN4tAj5uTb
 =0VFf
 -----END PGP SIGNATURE-----

Merge tag 'misc-habanalabs-next-2020-09-25' of git://people.freedesktop.org/~gabbayo/linux into char-misc-next

Oded writes:

This tag contains the following changes for kernel 5.10-rc1:

- release the kernel context object after we reset the device. This is
  needed to prevent a race where the firmware still has some in-flight
  transcations that require the kernel context (and its memory mappings) to
  be alive.

- replace constant numbers with defines in QMAN initialization of GAUDI

- correct an error message text and add a few debug messages to help debug
  issues that happen during context open and close.

* tag 'misc-habanalabs-next-2020-09-25' of git://people.freedesktop.org/~gabbayo/linux:
  habanalabs/gaudi: configure QMAN LDMA registers properly
  habanalabs: add notice of device not idle
  habanalabs: add debug messages for opening/closing context
  habanalabs: release kernel context after hw_fini
  habanalabs: correct an error message
2020-09-25 14:36:48 +02:00
Arnd Bergmann
1c954540c0 staging: vchiq: avoid mixing kernel and user pointers
As found earlier, there is a problem in the create_pagelist() function
that takes a pointer argument that either points into vmalloc space or
into user space, with the pointer value controlled by user space allowing
a malicious user to trick the driver into accessing the kernel instead.

Avoid this problem by adding another function argument and passing
kernel pointers separately from user pointers. This makes it possible
to rely on sparse to point out invalid conversions, and it prevents
user space from faking a kernel pointer.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200925114424.2647144-2-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25 14:34:03 +02:00
Arnd Bergmann
4184da4f31 staging: vchiq: fix __user annotations
My earlier patches caused some new sparse warnings, but it turns out
that a number of those are actual bugs, or at least suspicous code.

Adding __user annotations to the data structures that are defined in
uapi headers helps avoid the new warnings, but that causes a different
set of warnings to show up, as some of these structures are used both
inside of the kernel and at the user interface but storing pointers to
different things there.

Duplicating the vchiq_service_params and vchiq_completion_data structures
in turn takes care of most of those, and then it turns out that there
is a 'data' pointer that can be any of a __user address, a dmd_addr_t
and a kernel pointer in vmalloc space at times.

I'm trying to annotate these as best I can without changing behavior,
but there still seems to be a serious bug when user space passes
a valid vmalloc space address instead of a user pointer. Adding
comments in the code there, and leaving the warnings in place that
seem to correspond to actual bugs.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200925114424.2647144-1-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-25 14:34:03 +02:00
Peter Oskolkov
f166b111e0 rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
Based on Google-internal RSEQ work done by Paul Turner and Andrew
Hunter.

This patch adds a selftest for MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
The test quite often fails without the previous patch in this
patchset, but consistently passes with it.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-3-posk@google.com
2020-09-25 14:23:27 +02:00
Peter Oskolkov
ea366dd79c rseq/selftests,x86_64: Add rseq_offset_deref_addv()
This patch adds rseq_offset_deref_addv() function to
tools/testing/selftests/rseq/rseq-x86.h, to be used in a selftest in
the next patch in the patchset.

Once an architecture adds support for this function they should define
"RSEQ_ARCH_HAS_OFFSET_DEREF_ADDV".

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-2-posk@google.com
2020-09-25 14:23:27 +02:00
Peter Oskolkov
2a36ab717e rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
This patchset is based on Google-internal RSEQ work done by Paul
Turner and Andrew Hunter.

When working with per-CPU RSEQ-based memory allocations, it is
sometimes important to make sure that a global memory location is no
longer accessed from RSEQ critical sections. For example, there can be
two per-CPU lists, one is "active" and accessed per-CPU, while another
one is inactive and worked on asynchronously "off CPU" (e.g.  garbage
collection is performed). Then at some point the two lists are
swapped, and a fast RCU-like mechanism is required to make sure that
the previously active list is no longer accessed.

This patch introduces such a mechanism: in short, membarrier() syscall
issues an IPI to a CPU, restarting a potentially active RSEQ critical
section on the CPU.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lkml.kernel.org/r/20200923233618.2572849-1-posk@google.com
2020-09-25 14:23:27 +02:00
Barry Song
233e7aca4c sched/fair: Use dst group while checking imbalance for NUMA balancer
Barry Song noted the following

	Something is wrong. In find_busiest_group(), we are checking if
	src has higher load, however, in task_numa_find_cpu(), we are
	checking if dst will have higher load after balancing. It seems
	it is not sensible to check src.

	It maybe cause wrong imbalance value, for example,

	if dst_running = env->dst_stats.nr_running + 1 results in 3 or
	above, and src_running = env->src_stats.nr_running - 1 results
	in 1;

	The current code is thinking imbalance as 0 since src_running is
	smaller than 2.  This is inconsistent with load balancer.

Basically, in find_busiest_group(), the NUMA imbalance is ignored if moving
a task "from an almost idle domain" to a "domain with spare capacity". This
patch forbids movement "from a misplaced domain" to "an almost idle domain"
as that is closer to what the CPU load balancer expects.

This patch is not a universal win. The old behaviour was intended to allow
a task from an almost idle NUMA node to migrate to its preferred node if
the destination had capacity but there are corner cases.  For example,
a NAS compute load could be parallelised to use 1/3rd of available CPUs
but not all those potential tasks are active at all times allowing this
logic to trigger. An obvious example is specjbb 2005 running various
numbers of warehouses on a 2 socket box with 80 cpus.

specjbb
                               5.9.0-rc4              5.9.0-rc4
                                 vanilla        dstbalance-v1r1
Hmean     tput-1     46425.00 (   0.00%)    43394.00 *  -6.53%*
Hmean     tput-2     98416.00 (   0.00%)    96031.00 *  -2.42%*
Hmean     tput-3    150184.00 (   0.00%)   148783.00 *  -0.93%*
Hmean     tput-4    200683.00 (   0.00%)   197906.00 *  -1.38%*
Hmean     tput-5    236305.00 (   0.00%)   245549.00 *   3.91%*
Hmean     tput-6    281559.00 (   0.00%)   285692.00 *   1.47%*
Hmean     tput-7    338558.00 (   0.00%)   334467.00 *  -1.21%*
Hmean     tput-8    340745.00 (   0.00%)   372501.00 *   9.32%*
Hmean     tput-9    424343.00 (   0.00%)   413006.00 *  -2.67%*
Hmean     tput-10   421854.00 (   0.00%)   434261.00 *   2.94%*
Hmean     tput-11   493256.00 (   0.00%)   485330.00 *  -1.61%*
Hmean     tput-12   549573.00 (   0.00%)   529959.00 *  -3.57%*
Hmean     tput-13   593183.00 (   0.00%)   555010.00 *  -6.44%*
Hmean     tput-14   588252.00 (   0.00%)   599166.00 *   1.86%*
Hmean     tput-15   623065.00 (   0.00%)   642713.00 *   3.15%*
Hmean     tput-16   703924.00 (   0.00%)   660758.00 *  -6.13%*
Hmean     tput-17   666023.00 (   0.00%)   697675.00 *   4.75%*
Hmean     tput-18   761502.00 (   0.00%)   758360.00 *  -0.41%*
Hmean     tput-19   796088.00 (   0.00%)   798368.00 *   0.29%*
Hmean     tput-20   733564.00 (   0.00%)   823086.00 *  12.20%*
Hmean     tput-21   840980.00 (   0.00%)   856711.00 *   1.87%*
Hmean     tput-22   804285.00 (   0.00%)   872238.00 *   8.45%*
Hmean     tput-23   795208.00 (   0.00%)   889374.00 *  11.84%*
Hmean     tput-24   848619.00 (   0.00%)   966783.00 *  13.92%*
Hmean     tput-25   750848.00 (   0.00%)   903790.00 *  20.37%*
Hmean     tput-26   780523.00 (   0.00%)   962254.00 *  23.28%*
Hmean     tput-27  1042245.00 (   0.00%)   991544.00 *  -4.86%*
Hmean     tput-28  1090580.00 (   0.00%)  1035926.00 *  -5.01%*
Hmean     tput-29   999483.00 (   0.00%)  1082948.00 *   8.35%*
Hmean     tput-30  1098663.00 (   0.00%)  1113427.00 *   1.34%*
Hmean     tput-31  1125671.00 (   0.00%)  1134175.00 *   0.76%*
Hmean     tput-32   968167.00 (   0.00%)  1250286.00 *  29.14%*
Hmean     tput-33  1077676.00 (   0.00%)  1060893.00 *  -1.56%*
Hmean     tput-34  1090538.00 (   0.00%)  1090933.00 *   0.04%*
Hmean     tput-35   967058.00 (   0.00%)  1107421.00 *  14.51%*
Hmean     tput-36  1051745.00 (   0.00%)  1210663.00 *  15.11%*
Hmean     tput-37  1019465.00 (   0.00%)  1351446.00 *  32.56%*
Hmean     tput-38  1083102.00 (   0.00%)  1064541.00 *  -1.71%*
Hmean     tput-39  1232990.00 (   0.00%)  1303623.00 *   5.73%*
Hmean     tput-40  1175542.00 (   0.00%)  1340943.00 *  14.07%*
Hmean     tput-41  1127826.00 (   0.00%)  1339492.00 *  18.77%*
Hmean     tput-42  1198313.00 (   0.00%)  1411023.00 *  17.75%*
Hmean     tput-43  1163733.00 (   0.00%)  1228253.00 *   5.54%*
Hmean     tput-44  1305562.00 (   0.00%)  1357886.00 *   4.01%*
Hmean     tput-45  1326752.00 (   0.00%)  1406061.00 *   5.98%*
Hmean     tput-46  1339424.00 (   0.00%)  1418451.00 *   5.90%*
Hmean     tput-47  1415057.00 (   0.00%)  1381570.00 *  -2.37%*
Hmean     tput-48  1392003.00 (   0.00%)  1421167.00 *   2.10%*
Hmean     tput-49  1408374.00 (   0.00%)  1418659.00 *   0.73%*
Hmean     tput-50  1359822.00 (   0.00%)  1391070.00 *   2.30%*
Hmean     tput-51  1414246.00 (   0.00%)  1392679.00 *  -1.52%*
Hmean     tput-52  1432352.00 (   0.00%)  1354020.00 *  -5.47%*
Hmean     tput-53  1387563.00 (   0.00%)  1409563.00 *   1.59%*
Hmean     tput-54  1406420.00 (   0.00%)  1388711.00 *  -1.26%*
Hmean     tput-55  1438804.00 (   0.00%)  1387472.00 *  -3.57%*
Hmean     tput-56  1399465.00 (   0.00%)  1400296.00 *   0.06%*
Hmean     tput-57  1428132.00 (   0.00%)  1396399.00 *  -2.22%*
Hmean     tput-58  1432385.00 (   0.00%)  1386253.00 *  -3.22%*
Hmean     tput-59  1421612.00 (   0.00%)  1371416.00 *  -3.53%*
Hmean     tput-60  1429423.00 (   0.00%)  1389412.00 *  -2.80%*
Hmean     tput-61  1396230.00 (   0.00%)  1351122.00 *  -3.23%*
Hmean     tput-62  1418396.00 (   0.00%)  1383098.00 *  -2.49%*
Hmean     tput-63  1409918.00 (   0.00%)  1374662.00 *  -2.50%*
Hmean     tput-64  1410236.00 (   0.00%)  1376216.00 *  -2.41%*
Hmean     tput-65  1396405.00 (   0.00%)  1364418.00 *  -2.29%*
Hmean     tput-66  1395975.00 (   0.00%)  1357326.00 *  -2.77%*
Hmean     tput-67  1392986.00 (   0.00%)  1349642.00 *  -3.11%*
Hmean     tput-68  1386541.00 (   0.00%)  1343261.00 *  -3.12%*
Hmean     tput-69  1374407.00 (   0.00%)  1342588.00 *  -2.32%*
Hmean     tput-70  1377513.00 (   0.00%)  1334654.00 *  -3.11%*
Hmean     tput-71  1369319.00 (   0.00%)  1334952.00 *  -2.51%*
Hmean     tput-72  1354635.00 (   0.00%)  1329005.00 *  -1.89%*
Hmean     tput-73  1350933.00 (   0.00%)  1318942.00 *  -2.37%*
Hmean     tput-74  1351714.00 (   0.00%)  1316347.00 *  -2.62%*
Hmean     tput-75  1352198.00 (   0.00%)  1309974.00 *  -3.12%*
Hmean     tput-76  1349490.00 (   0.00%)  1286064.00 *  -4.70%*
Hmean     tput-77  1336131.00 (   0.00%)  1303684.00 *  -2.43%*
Hmean     tput-78  1308896.00 (   0.00%)  1271024.00 *  -2.89%*
Hmean     tput-79  1326703.00 (   0.00%)  1290862.00 *  -2.70%*
Hmean     tput-80  1336199.00 (   0.00%)  1291629.00 *  -3.34%*

The performance at the mid-point is better but not universally better. The
patch is a mixed bag depending on the workload, machine and overall
levels of utilisation. Sometimes it's better (sometimes much better),
other times it is worse (sometimes much worse). Given that there isn't a
universally good decision in this section and more people seem to prefer
the patch then it may be best to keep the LB decisions consistent and
revisit imbalance handling when the load balancer code changes settle down.

Jirka Hladky added the following observation.

	Our results are mostly in line with what you see. We observe
	big gains (20-50%) when the system is loaded to 1/3 of the
	maximum capacity and mixed results at the full load - some
	workloads benefit from the patch at the full load, others not,
	but performance changes at the full load are mostly within the
	noise of results (+/-5%). Overall, we think this patch is helpful.

[mgorman@techsingularity.net: Rewrote changelog]
Fixes: fb86f5b211 ("sched/numa: Use similar logic to the load balancer for moving between domains with spare capacity")
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200921221849.GI3179@techsingularity.net
2020-09-25 14:23:26 +02:00
Vincent Guittot
6e7499135d sched/fair: Reduce busy load balance interval
The busy_factor, which increases load balance interval when a cpu is busy,
is set to 32 by default. This value generates some huge LB interval on
large system like the THX2 made of 2 node x 28 cores x 4 threads.
For such system, the interval increases from 112ms to 3584ms at MC level.
And from 228ms to 7168ms at NUMA level.

Even on smaller system, a lower busy factor has shown improvement on the
fair distribution of the running time so let reduce it for all.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-5-vincent.guittot@linaro.org
2020-09-25 14:23:26 +02:00
Vincent Guittot
e4d32e4d54 sched/fair: Minimize concurrent LBs between domain level
sched domains tend to trigger simultaneously the load balance loop but
the larger domains often need more time to collect statistics. This
slowness makes the larger domain trying to detach tasks from a rq whereas
tasks already migrated somewhere else at a sub-domain level. This is not
a real problem for idle LB because the period of smaller domains will
increase with its CPUs being busy and this will let time for higher ones
to pulled tasks. But this becomes a problem when all CPUs are already busy
because all domains stay synced when they trigger their LB.

A simple way to minimize simultaneous LB of all domains is to decrement the
the busy interval by 1 jiffies. Because of the busy_factor, the interval of
larger domain will not be a multiple of smaller ones anymore.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-4-vincent.guittot@linaro.org
2020-09-25 14:23:26 +02:00
Vincent Guittot
2208cdaa56 sched/fair: Reduce minimal imbalance threshold
The 25% default imbalance threshold for DIE and NUMA domain is large
enough to generate significant unfairness between threads. A typical
example is the case of 11 threads running on 2x4 CPUs. The imbalance of
20% between the 2 groups of 4 cores is just low enough to not trigger
the load balance between the 2 groups. We will have always the same 6
threads on one group of 4 CPUs and the other 5 threads on the other
group of CPUS. With a fair time sharing in each group, we ends up with
+20% running time for the group of 5 threads.

Consider decreasing the imbalance threshold for overloaded case where we
use the load to balance task and to ensure fair time sharing.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Hillf Danton <hdanton@sina.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-3-vincent.guittot@linaro.org
2020-09-25 14:23:26 +02:00
Vincent Guittot
5a7f555904 sched/fair: Relax constraint on task's load during load balance
Some UCs like 9 always running tasks on 8 CPUs can't be balanced and the
load balancer currently migrates the waiting task between the CPUs in an
almost random manner. The success of a rq pulling a task depends of the
value of nr_balance_failed of its domains and its ability to be faster
than others to detach it. This behavior results in an unfair distribution
of the running time between tasks because some CPUs will run most of the
time, if not always, the same task whereas others will share their time
between several tasks.

Instead of using nr_balance_failed as a boolean to relax the condition
for detaching task, the LB will use nr_balanced_failed to relax the
threshold between the tasks'load and the imbalance. This mecanism
prevents the same rq or domain to always win the load balance fight.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lkml.kernel.org/r/20200921072424.14813-2-vincent.guittot@linaro.org
2020-09-25 14:23:25 +02:00
Xianting Tian
fe7491580d sched/fair: Remove the force parameter of update_tg_load_avg()
In the file fair.c, sometims update_tg_load_avg(cfs_rq, 0) is used,
sometimes update_tg_load_avg(cfs_rq, false) is used.
update_tg_load_avg() has the parameter force, but in current code,
it never set 1 or true to it, so remove the force parameter.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200924014755.36253-1-tian.xianting@h3c.com
2020-09-25 14:23:25 +02:00
Xunlei Pang
df3cb4ea1f sched/fair: Fix wrong cpu selecting from isolated domain
We've met problems that occasionally tasks with full cpumask
(e.g. by putting it into a cpuset or setting to full affinity)
were migrated to our isolated cpus in production environment.

After some analysis, we found that it is due to the current
select_idle_smt() not considering the sched_domain mask.

Steps to reproduce on my 31-CPU hyperthreads machine:
1. with boot parameter: "isolcpus=domain,2-31"
   (thread lists: 0,16 and 1,17)
2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
3. some threads will be migrated to the isolated cpu16~17.

Fix it by checking the valid domain mask in select_idle_smt().

Fixes: 10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings())
Reported-by: Wetp Zhang <wetp.zy@linux.alibaba.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com
2020-09-25 14:23:25 +02:00
YueHaibing
51bd5121c4 sched: Remove unused inline function uclamp_bucket_base_value()
There is no caller in tree, so can remove it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/20200922132410.48440-1-yuehaibing@huawei.com
2020-09-25 14:23:25 +02:00
Daniel Bristot de Oliveira
2586af1ac1 sched/rt: Disable RT_RUNTIME_SHARE by default
The RT_RUNTIME_SHARE sched feature enables the sharing of rt_runtime
between CPUs, allowing a CPU to run a real-time task up to 100% of the
time while leaving more space for non-real-time tasks to run on the CPU
that lend rt_runtime.

The problem is that a CPU can easily borrow enough rt_runtime to allow
a spinning rt-task to run forever, starving per-cpu tasks like kworkers,
which are non-real-time by design.

This patch disables RT_RUNTIME_SHARE by default, avoiding this problem.
The feature will still be present for users that want to enable it,
though.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Wei Wang <wvw@google.com>
Link: https://lkml.kernel.org/r/b776ab46817e3db5d8ef79175fa0d71073c051c7.1600697903.git.bristot@redhat.com
2020-09-25 14:23:24 +02:00
Lucas Stach
46fcc4b00c sched/deadline: Fix stale throttling on de-/boosted tasks
When a boosted task gets throttled, what normally happens is that it's
immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
runtime and clears the dl_throttled flag. There is a special case however:
if the throttling happened on sched-out and the task has been deboosted in
the meantime, the replenish is skipped as the task will return to its
normal scheduling class. This leaves the task with the dl_throttled flag
set.

Now if the task gets boosted up to the deadline scheduling class again
while it is sleeping, it's still in the throttled state. The normal wakeup
however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
actually place it on the rq. Thus we end up with a task that is runnable,
but not actually on the rq and neither a immediate replenishment happens,
nor is the replenishment timer set up, so the task is stuck in
forever-throttled limbo.

Clear the dl_throttled flag before dropping back to the normal scheduling
class to fix this issue.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200831110719.2126930-1-l.stach@pengutronix.de
2020-09-25 14:23:24 +02:00
Vincent Guittot
8e0e0eda6a sched/numa: Use runnable_avg to classify node
Use runnable_avg to classify numa node state similarly to what is done for
normal load balancer. This helps to ensure that numa and normal balancers
use the same view of the state of the system.

Large arm64system: 2 nodes / 224 CPUs:

  hackbench -l (256000/#grp) -g #grp

  grp    tip/sched/core         +patchset              improvement
  1      14,008(+/- 4,99 %)     13,800(+/- 3.88 %)     1,48 %
  4       4,340(+/- 5.35 %)      4.283(+/- 4.85 %)     1,33 %
  16      3,357(+/- 0.55 %)      3.359(+/- 0.54 %)    -0,06 %
  32      3,050(+/- 0.94 %)      3.039(+/- 1,06 %)     0,38 %
  64      2.968(+/- 1,85 %)      3.006(+/- 2.92 %)    -1.27 %
  128     3,290(+/-12.61 %)      3,108(+/- 5.97 %)     5.51 %
  256     3.235(+/- 3.95 %)      3,188(+/- 2.83 %)     1.45 %

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20200921072959.16317-1-vincent.guittot@linaro.org
2020-09-25 14:23:24 +02:00
Liu Shixin
b942fc0319 RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()
sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-25 09:17:42 -03:00
Krzysztof Kozlowski
197bbae9ed arm64: dts: ti: k3-j721e-common-proc-board: align GPIO hog names with dtschema
The convention for node names is to use hyphens, not underscores.
dtschema for pca95xx expects GPIO hogs to end with 'hog' prefix.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Nishanth Menon <nm@ti.com>
Link: https://lore.kernel.org/r/20200916155715.21009-7-krzk@kernel.org
2020-09-25 06:59:31 -05:00
Ofir Bitton
25121d9804 habanalabs/gaudi: configure QMAN LDMA registers properly
LDMA registers are configured with a fixed value.
We add new define set which gives the configuration
a proper meaning.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:21 +03:00
Oded Gabbay
eab1f6e7b0 habanalabs: add notice of device not idle
The device should be idle after a context is closed. If not, print a
notice.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:21 +03:00
Oded Gabbay
3c3aa5dbd6 habanalabs: add debug messages for opening/closing context
During debugging of error we sometimes need to know whether the error
happened when a user context was open. Add debug prints when opening and
closing user contexts.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:20 +03:00
Oded Gabbay
9e2e8fc7d6 habanalabs: release kernel context after hw_fini
Some engines use resources that belong to the kernel context (e.g. MMU
mappings). In case the halt-engines doesn't work properly due to H/W
restriction, we need to make sure the kernel context lives on until after
the hw_fini. The hw_fini resets the ASIC after that no engine is alive and
we can safely close the kernel context.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:20 +03:00
Oded Gabbay
fc6121e961 habanalabs: correct an error message
We don't try to allocate huge pages here so remove the huge word.

Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-25 14:44:20 +03:00
Krzysztof Kozlowski
5e7998b801 ARM: dts: am3874: iceboard: fix GPIO expander reset GPIOs
Correct the property for reset GPIOs of the GPIO expander.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-09-25 14:38:31 +03:00
Krzysztof Kozlowski
ccd73f07e0 ARM: dts: am335x: t335: align GPIO hog names with dtschema
The convention for node names is to use hyphens, not underscores.
dtschema for pca95xx expects GPIO hogs to end with 'hog' prefix.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-09-25 14:37:30 +03:00
Krzysztof Kozlowski
97b16ed103 ARM: dts: am335x: lxm: fix PCA9539 GPIO expander properties
The PCA9539 GPIO expander requires GPIO controller properties to operate
properly.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-09-25 14:36:11 +03:00
Grygorii Strashko
8cbe7afc92 ARM: dts: am437x-l4: drop legacy cpsw dt node
All am437x boards have been converted to use new driver, so drop legacy
cpsw dt node.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-09-25 14:31:12 +03:00
Grygorii Strashko
aff7e5038c ARM: dts: am437x: switch to new cpsw switch drv
The dual_mac mode has been preserved the same way between legacy and new
driver, and one port devices works the same as 1 dual_mac port - it's safe
to switch drivers.
So, Switch all am437x boards to use new cpsw switch driver. Those boards
have or 2 Ext. port wired and configured in dual_mac mode by default, or
only 1 Ext. port.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-09-25 14:31:05 +03:00
Grygorii Strashko
7bf8f37aea ARM: dts: am437x-l4: add dt node for new cpsw switchdev driver
Add DT node for the new cpsw switchdev based driver.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2020-09-25 14:30:58 +03:00
Krzysztof Kozlowski
94d4c3cffe mmc: sdhci-s3c: hide forward declaration of of_device_id behind CONFIG_OF
The struct of_device_id is not defined with !CONFIG_OF so its forward
declaration should be hidden to as well.  This should address clang
compile warning:

  drivers/mmc/host/sdhci-s3c.c:464:34: warning: tentative array definition assumed to have one element

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200925072532.10272-1-krzk@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-09-25 13:30:52 +02:00
Krzysztof Kozlowski
0cb231f1e0 mmc: sdhci: fix indentation mistakes
Fix inconsistent indenting, reported by Smatch:

  drivers/mmc/host/sdhci-esdhc-imx.c:1380 sdhci_esdhc_imx_hwinit() warn: inconsistent indenting
  drivers/mmc/host/sdhci-sprd.c:390 sdhci_sprd_request_done() warn: inconsistent indenting

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200923153739.30327-2-krzk@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-09-25 13:24:02 +02:00
Krzysztof Kozlowski
6b28f2c4da mmc: moxart: remove unneeded check for drvdata
The 'struct mmc_host *mmc' comes from drvdata set at the end of probe,
so it cannot be NULL.  The code already dereferences it few lines before
the check with mmc_priv().  This also fixes smatch warning:

  drivers/mmc/host/moxart-mmc.c:692 moxart_remove() warn: variable dereferenced before check 'mmc' (see line 688)

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20200923153739.30327-1-krzk@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-09-25 13:24:02 +02:00
Wolfram Sang
fbb31330f9 mmc: renesas_sdhi: drop local flag for tuning
The MMC core has now a generic check if some tuning is in progress. Its
protected area is a bit larger than the custom one in this driver but we
concluded that this works equally well for the intended case. So, drop
the local flag and switch to the generic one.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20200922172253.4458-1-wsa@kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-09-25 13:24:02 +02:00
Qinglang Miao
8dae6a249c mmc: rtsx_usb_sdmmc: simplify the return expression of sd_change_phase()
Simplify the return expression.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Link: https://lore.kernel.org/r/20200921131042.92340-1-miaoqinglang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-09-25 13:24:02 +02:00
Wolfram Sang
3439c588c2 mmc: core: document mmc_hw_reset()
Add documentation for mmc_hw_reset to make sure the intended use case is
clear.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20200918215446.65654-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-09-25 13:24:02 +02:00
Jason Gerecke
d9216d753b HID: wacom: Avoid entering wacom_wac_pen_report for pad / battery
It has recently been reported that the "heartbeat" report from devices
like the 2nd-gen Intuos Pro (PTH-460, PTH-660, PTH-860) or the 2nd-gen
Bluetooth-enabled Intuos tablets (CTL-4100WL, CTL-6100WL) can cause the
driver to send a spurious BTN_TOUCH=0 once per second in the middle of
drawing. This can result in broken lines while drawing on Chrome OS.

The source of the issue has been traced back to a change which modified
the driver to only call `wacom_wac_pad_report()` once per report instead
of once per collection. As part of this change, pad-handling code was
removed from `wacom_wac_collection()` under the assumption that the
`WACOM_PEN_FIELD` and `WACOM_TOUCH_FIELD` checks would not be satisfied
when a pad or battery collection was being processed.

To be clear, the macros `WACOM_PAD_FIELD` and `WACOM_PEN_FIELD` do not
currently check exclusive conditions. In fact, most "pad" fields will
also appear to be "pen" fields simply due to their presence inside of
a Digitizer application collection. Because of this, the removal of
the check from `wacom_wac_collection()` just causes pad / battery
collections to instead trigger a call to `wacom_wac_pen_report()`
instead. The pen report function in turn resets the tip switch state
just prior to exiting, resulting in the observed BTN_TOUCH=0 symptom.

To correct this, we restore a version of the `WACOM_PAD_FIELD` check
in `wacom_wac_collection()` and return early. This effectively prevents
pad / battery collections from being reported until the very end of the
report as originally intended.

Fixes: d4b8efeb46 ("HID: wacom: generic: Correct pad syncing")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Tested-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2020-09-25 13:14:16 +02:00
Alex Hung
b226faab4e ACPI: video: use ACPI backlight for HP 635 Notebook
The default backlight interface is AMD's radeon_bl0 which does not
work on this system, so use the ACPI backlight interface on it
instead.

BugLink: https://bugs.launchpad.net/bugs/1894667
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Alex Hung <alex.hung@canonical.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-25 12:59:24 +02:00
Andy Shevchenko
399e08f1f0 MAINTAINERS: Use my kernel.org address for Intel PMIC work
Use my kernel.org address for Intel PMIC work. While here,
upgrade status to maintainer of PMIC MFD devices.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-25 12:52:33 +02:00
Hanjun Guo
32c6f3ffa0 ACPI: APD: Clean up header file include statements
Make the included header files appear in the alphabetical order and
remove the unnecessary header file inclusions.

Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-25 12:48:11 +02:00
Hanjun Guo
ee2bc5d2c4 ACPI: APD: Remove unnecessary APD_ADDR() macro stub
Since APD_ADDR(desc) is only used when CONFIG_X86_AMD_PLATFORM_DEVICE
or CONFIG_ARM64 is set, no need for the stub APD_ADDR(desc) definition
covering the other cases, so remove it.

Also update the comments for #endif.

Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-25 12:48:11 +02:00