Reduce the module/device probe error into a mere debug to hide issues
where the initial modeset is failing (after lies told by hw probe) and
the system hangs with a livelock in cleaning up the failed commit.
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=210619
Fixes: b3bf99daae ("drm/i915/display: Defer initial modeset until after GGTT is initialised")
Fixes: ccc9e67ab2 ("drm/i915/display: Defer initial modeset until after GGTT is initialised")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201210230741.17140-1-chris@chris-wilson.co.uk
- Check the correct variable in selftest (Dan)
- Propagate error from canceled submit due to context closure (Chris)
- Ignore repeated attempts to suspend request flow across reset (Chris)
- Cancel the preemption timeout on responding to it (Chris)
- Fix unsigned compared against 0 (Colin)
- Compute the correct slice count for VDSC on DP (Manasi)
- Declar gen9 has 64 mocs entries (Chris)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEbSBwaO7dZQkcLOKj+mJfZA7rE8oFAl/RYfgACgkQ+mJfZA7r
E8qAFAf+KdbI2Is7i8OdHOz5sP97QplFVn9JY3rfXbi+ekIvZD9F/9yLyoav2a5Q
SLScUMlxbXUMyBUav3gGxJLcSIyHZRdq75rlZ5zjcF7GaNUu+VBEahbPiQ2tLC5A
IO98SkHxktnHGyHQqELIH4vzm+3zwQhB8X4moob3bhAJPjSITQkUUsUNARljoSJ+
i0fYsh6QAmhzMTcSMaRQIgk7FJYvTk2//4BeD2viErnbjMqANg8hXcVN0UsWlVgO
OEj6QcIEOU9MWCkm57GmsjIjhYd18xg0IJ/OjJSl9S4mC2u89/ZEK6khr0+iJZE4
dIh5gZmCmkn2s3kfpzWaaVRej6R6SQ==
=1+fd
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-fixes-2020-12-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Fixes for VDSC/DP, selftests, shmem_utils, preemption, submission, and gt reset:
- Check the correct variable in selftest (Dan)
- Propagate error from canceled submit due to context closure (Chris)
- Ignore repeated attempts to suspend request flow across reset (Chris)
- Cancel the preemption timeout on responding to it (Chris)
- Fix unsigned compared against 0 (Colin)
- Compute the correct slice count for VDSC on DP (Manasi)
- Declar gen9 has 64 mocs entries (Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201209235010.GA10554@intel.com
amd-drm-fixes-5.10-2020-12-09:
amdgpu:
- Fan fix for CI asics
- Fix a warning in possible_crtcs
- Build fix for when debugfs is disabled
- Display overflow fix
- Display watermark fixes for Renoir
- SDMA 5.2 fix
- Stolen vga memory regression fix
- Power profile fixes
- Fix a regression from removal of GEM and PRIME callbacks
amdkfd:
- Fix a memory leak in dmabuf import
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201210034848.18108-1-alexander.deucher@amd.com
The "COMPUTE" was wrongly spelled as "CUSTOM".
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.9.x
For BOs imported from outside of amdgpu, setting of amdgpu_gem_object_funcs
was missing in amdgpu_dma_buf_create_obj. Fix by refactoring BO creation
and amdgpu_gem_object_funcs setting into single function called
from both code paths.
Fixes: d693def4fd ("drm: Remove obsolete GEM and PRIME callbacks from struct drm_driver")
v2: Use use amdgpu_gem_object_create() directly
v3: fix warning
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If we need to keep the stolen vga memory, make sure it is
at least as big as the legacy vga size.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
When using old WORKLOAD_PPLIB setting in smu10.h, there is problem that
it can't be able to switch to mak gpu clk during compute workload.
It needs to update WORKLOAD_PPLIB setting to fix this issue.
Signed-off-by: Changfeng <Changfeng.Zhu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Release dmabuf reference before returning from kfd_ioctl_import_dmabuf.
amdgpu_amdkfd_gpuvm_import_dmabuf takes a reference to the underlying
GEM BO and doesn't keep the reference to the dmabuf wrapper.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
each sdma instance fw_version and feature_version
should be set right value when asic type isn't
between SIENNA_CICHILD and CHIP_DIMGREY_CAVEFISH
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
Without additional HostVM Latency, Renoir takes 2us longer to exit
self-refresh. This causes underflow in certain cases.
[How]
Add table for Renoir with updated sr exit latencies for WM set A.
Signed-off-by: Sung Lee <sung.lee@amd.com>
Reviewed-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
At very high pixel clock, bandwidth calculation exceeds 32 bit size
and overflow value. This causes the resulting selection of link rate
to be inaccurate.
[How]
Change order of operation and use fixed point to deal with integer
accuracy. Also address bug found when forcing link rate.
Signed-off-by: Chris Park <Chris.Park@amd.com>
Reviewed-by: Wenjing Liu <Wenjing.Liu@amd.com>
Acked-by: Eryk Brol <eryk.brol@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is still a warning when CONFIG_DEBUG_FS is disabled:
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1145:13: error: 'amdgpu_ras_debugfs_create_ctrl_node' defined but not used [-Werror=unused-function]
1145 | static void amdgpu_ras_debugfs_create_ctrl_node(struct amdgpu_device *adev)
Change the code again to make the compiler actually drop
this code but not warn about it.
Fixes: ae2bf61ff3 ("drm/amdgpu: guard ras debugfs creation/removal based on CONFIG_DEBUG_FS")
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To avoid a recently added warning:
Bogus possible_crtcs: [ENCODER:65:TMDS-65] possible_crtcs=0xf (full crtc mask=0x7)
WARNING: CPU: 3 PID: 439 at drivers/gpu/drm/drm_mode_config.c:617 drm_mode_config_validate+0x178/0x200 [drm]
In this case the warning is harmless, but confusing to users.
Fixes: 0df1082374 ("drm: Validate encoder->possible_crtcs")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=209123
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
We checked the table size against a hardcoded number of entries, and
that number was excluding the special mocs registers at the end.
Fixes: 777a7717d6 ("drm/i915/gt: Program mocs:63 for cache eviction on gen9")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.3+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201127102540.13117-1-chris@chris-wilson.co.uk
(cherry picked from commit 444fbf5d70)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[backported and updated the Fixes sha]
This patch fixes the slice count computation algorithm
for calculating the slice count based on Peak pixel rate
and the max slice width allowed on the DSC engines.
We need to ensure slice count > min slice count req
as per DP spec based on peak pixel rate and that it is
greater than min slice count based on the max slice width
advertised by DPCD. So use max of these two.
In the prev patch we were using min of these 2 causing it
to violate the max slice width limitation causing a blank
screen on 8K@60.
Fixes: d9218c8f6c ("drm/i915/dp: Add helpers for Compressed BPP and Slice Count for DSC")
Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204205804.25225-1-manasi.d.navare@intel.com
(cherry picked from commit d371d6ea92)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Currently the check that the unsigned size_t variable i is >= 0
is always true because the unsigned variable will never be negative,
causing the loop to run forever. Fix this by changing the
pre-decrement check to a zero check on i followed by a decrement of i.
Addresses-Coverity: ("Unsigned compared against 0")
Fixes: bfed6708d6 ("drm/i915: use vmap in shmem_pin_map")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002170354.94627-1-colin.king@canonical.com
(cherry picked from commit e70956a249)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We currently presume that the engine reset is successful, cancelling the
expired preemption timer in the process. However, engine resets can
fail, leaving the timeout still pending and we will then respond to the
timeout again next time the tasklet fires. What we want is for the
failed engine reset to be promoted to a full device reset, which is
kicked by the heartbeat once the engine stops processing events.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 3a7a92aba8 ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-2-chris@chris-wilson.co.uk
(cherry picked from commit d997e240ce)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Before reseting the engine, we suspend the execution of the guilty
request, so that we can continue execution with a new context while we
slowly compress the captured error state for the guilty context. However,
if the reset fails, we will promptly attempt to reset the same request
again, and discover the ongoing capture. Ignore the second attempt to
suspend and capture the same request.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1168
Fixes: 32ff621fd7 ("drm/i915/gt: Allow temporary suspension of inflight requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201204151234.19729-1-chris@chris-wilson.co.uk
(cherry picked from commit b969540500)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In the course of discovering and closing many races with context closure
and execbuf submission, since commit 61231f6bd0 ("drm/i915/gem: Check
that the context wasn't closed during setup") we started checking that
the context was not closed by another userspace thread during the execbuf
ioctl. In doing so we cancelled the inflight request (by telling it to be
skipped), but kept reporting success since we do submit a request, albeit
one that doesn't execute. As the error is known before we return from the
ioctl, we can report the error we detect immediately, rather than leave
it on the fence status. With the immediate propagation of the error, it
is easier for userspace to handle.
Fixes: 61231f6bd0 ("drm/i915/gem: Check that the context wasn't closed during setup")
Testcase: igt/gem_ctx_exec/basic-close-race
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v5.7+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201203103432.31526-1-chris@chris-wilson.co.uk
(cherry picked from commit ba38b79eae)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There is a copy and paste bug in this code. It's supposed to check
"obj2" instead of checking "obj" a second time.
Fixes: 80f0b679d6 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/8ilneOcJAjwqU4t@mwand
(cherry picked from commit 14f2d7604f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Here are some small driver fixes, and one "large" revert, for 5.10-rc7.
They include:
- revert mei patch from 5.10-rc1 that was using a reserved
userspace value. It will be resubmitted once the proper id
has been assigned by the virtio people.
- habanalabs fixes found by the fall-through audit from Gustavo
- speakup driver fixes for reported issues
- fpga config build fix for reported issue.
All of these except the revert have been in linux-next with no reported
issues. The revert is "clean" and just removes a previously-added
driver, so no real issue there.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX8zr3g8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk4LwCgz1KKObSoSy8TdbSneDlAZuJlhS0AniwKDQ8g
EobiywN+/Q3KP5jGF4Km
=8rD3
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes, and one "large" revert, for
5.10-rc7.
They include:
- revert mei patch from 5.10-rc1 that was using a reserved userspace
value. It will be resubmitted once the proper id has been assigned
by the virtio people.
- habanalabs fixes found by the fall-through audit from Gustavo
- speakup driver fixes for reported issues
- fpga config build fix for reported issue.
All of these except the revert have been in linux-next with no
reported issues. The revert is "clean" and just removes a
previously-added driver, so no real issue there"
* tag 'char-misc-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "mei: virtio: virtualization frontend driver"
fpga: Specify HAS_IOMEM dependency for FPGA_DFL
habanalabs: put devices before driver removal
habanalabs: free host huge va_range if not used
speakup: Reject setting the speakup line discipline outside of speakup
Here are two tty core fixes for 5.10-rc7.
They resolve some reported locking issues in the tty core. While they
have not been in a released linux-next yet, they have passed all of the
0-day bot testing as well as the submitter's testing.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX8zqVA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn6aACgiBghNkA6SJIjM1mcPCI/Nkvs3qoAn0g+l8Z1
zATUgLS4xqAHnYXUfHpy
=eB5C
-----END PGP SIGNATURE-----
Merge tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are two tty core fixes for 5.10-rc7.
They resolve some reported locking issues in the tty core. While they
have not been in a released linux-next yet, they have passed all of
the 0-day bot testing as well as the submitter's testing"
* tag 'tty-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix ->session locking
tty: Fix ->pgrp locking in tiocspgrp()
Here are some small USB fixes for 5.10-rc7 that resolve a number of
reported issues, and add some new device ids.
Nothing major here, but these solve some problems that people were
having with the 5.10-rc tree:
- reverts for USB storage dma settings that broke working
devices
- thunderbolt use-after-free fix
- cdns3 driver fixes
- gadget driver userspace copy fix
- new device ids
All of these except for the reverts have been in linux-next with no
reported issues. The reverts are "clean" and were tested by Hans, as
well as passing the 0-day tests.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX8zrIA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykf6wCgrsGpsGBvmUuL+ntikJ2G7dUIZp0AoM0f/cbC
wxbI9WQm4iZKLsl0t0v9
=Cua9
-----END PGP SIGNATURE-----
Merge tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.10-rc7 that resolve a number of
reported issues, and add some new device ids.
Nothing major here, but these solve some problems that people were
having with the 5.10-rc tree:
- reverts for USB storage dma settings that broke working devices
- thunderbolt use-after-free fix
- cdns3 driver fixes
- gadget driver userspace copy fix
- new device ids
All of these except for the reverts have been in linux-next with no
reported issues. The reverts are "clean" and were tested by Hans, as
well as passing the 0-day tests"
* tag 'usb-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: Use local copy of descriptors for userspace copy
usb: ohci-omap: Fix descriptor conversion
Revert "usb-storage: fix sdev->host->dma_dev"
Revert "uas: fix sdev->host->dma_dev"
Revert "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
USB: serial: kl5kusb105: fix memleak on open
USB: serial: ch341: sort device-id entries
USB: serial: ch341: add new Product ID for CH341A
USB: serial: option: fix Quectel BG96 matching
usb: cdns3: core: fix goto label for error path
usb: cdns3: gadget: clear trb->length as zero after preparing every trb
usb: cdns3: Fix hardware based role switch
USB: serial: option: add support for Thales Cinterion EXS82
USB: serial: option: add Fibocom NL668 variants
thunderbolt: Fix use-after-free in remove_unplugged_switch()
- Make the AMD L3 QoS code and data priorization enable/disable mechanism
work correctly. The control bit was only set/cleared on one of the CPUs
in a L3 domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one from
Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes the UV
hubs to be identified wrongly. The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the uprobes
code which fails to handle repeated prefixes properly. The aggregate
size of the prefixes can be larger than the bytes array but the code
blindly iterated over the aggregate size beyond the array boundary.
Add a macro to handle this case properly and use it at the affected
places.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M2GoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUGeD/0TxQyVIZTJjY/ExDYvQZeSWgvFVon9
A/QcwkgtkWzYKI3YThgxBtSNKhHPP8eQ9QK8ErcaQKEHU5TdXVEOwHgTm1A6OGat
0+M9/EWEe7tTu+cLpu7esQ+1VbSvEcFXZbljbhCKrShlBlsIjFG4eCPAprSw9yjI
RSgfXKZu4NmHVS6nfJTwmIIaTeLQ6U3b7b1D5s/66slBFScqnLbRNhABVbHbos4F
pl/lxDCFOddy2YbEojHjjGqMA7oxPav7c0nYFOM/zG+wAqfEjbqOxReT31bGQPi2
XT9K4JEqDqILo0KnhV4GsYoWAhes3BtmsJ9IoZ7IijsMriYl80mD9URAORidJ4PX
28Ckk9V/DlE8uDrAnBDcWDSoKlg78mhVV7V9L6v43teg/gJfSZNROtNDBmqRmwG4
Op2NJfzJITtaxVQuSZRkSs8rzGv+QUfaM1sBUQ+Oz4KYeIjjA7G2MAOECrzIAWKB
GWc5toYRVS6oGT+RbZhSxZYoh8ASoGJ2MrL8K4OV4RqEqHHcXcih0WmmljtsDIFI
td4FHHH6fghIb9S6iYKiApd6k2qKa33mwJwa/xZOoIrv0w5xT0WDJnxT60gu/Mec
YDkqhmA009CNSD2G4oNRNF5MH7gp34UII+25jOGatbVh+5DDPYs+5Jnh/DR7jssR
PryAG9ER7UUb6w==
=rNBq
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Make the AMD L3 QoS code and data priorization enable/disable
mechanism work correctly.
The control bit was only set/cleared on one of the CPUs in a L3
domain, but it has to be modified on all CPUs in the domain. The
initial documentation was not clear about this, but the updated one
from Oct 2020 spells it out.
- Fix an off by one in the UV platform detection code which causes
the UV hubs to be identified wrongly.
The chip revisions start at 1 not at 0.
- Fix a long standing bug in the evaluation of prefixes in the
uprobes code which fails to handle repeated prefixes properly.
The aggregate size of the prefixes can be larger than the bytes
array but the code blindly iterated over the aggregate size beyond
the array boundary. Add a macro to handle this case properly and
use it at the affected places"
* tag 'x86-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/insn-eval: Use new for_each_insn_prefix() macro to loop over prefixes bytes
x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
x86/platform/uv: Fix UV4 hub revision adjustment
x86/resctrl: Fix AMD L3 QOS CDP enable/disable
- Add recursion protection to another callchain invoked from
x86_pmu_stop() which can recurse back into x86_pmu_stop(). The first
attempt to fix this missed this extra code path.
- Use the already filtered status variable to check for PEBS counter
overflow bits and not the unfiltered full status read from
IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
would be evaluated incorrectly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M1lwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeKPEACo/xfvFgp/rlSIjh12OW7XTyKqKMVq
C8TvMFbzn3Pzx89RyJctyyi1d5xYz6ic3uXkV0yGJ1g9fkJdzhgCY/byoTd/nWor
/r6RRtw5Ah3werXlESbZWTb6691eqGO4TpWZMAwdDeV1eRAc7ZSoOLJiY0o4HVWd
AE2aX9/psikHBAeMzMHn5W4pELl0s6s6aX9OJLFFZeroA3UW3bWOHu8o/iylmld9
wnQPvGe+kk35ZBza8/kbBJv3ZasUBPXy1+9Z1p1aE8dmd+HrYnw9vS3El9dgtX1Y
ADKttAD3S3j/PbBOXOnIRoSTM7Vvc5gviLyPfw6erBIMtw/OBFSrRD7IbVJpyZmn
yysBgacu6pzeLXRZ4ZktQ2GPeAgdiZwsqwuvuQGd5t1zfdNkZEEI7aK87CrHCYOk
zMt7vNK/q5xF9wt9C3cYzka/Sp7UA10zqP31/fv2fOZb4dcANTtm9reS/fwqd67X
HYhYiks4z1wxqTOf76wBpXau9ZtQLb20MmC3S6IknDF39pODO1sGcWMJSB0AgYUE
3125ZTxrmlgv+U7S5qBHKB5ot5wvNUexiDD0g0o2N9x4dCQqErmvwUDWWUoqq35A
g4tqODudwM4aU05leWO/piZEBr4xxAiRke9r8WJZNHAo0C7yHjjmXRtZYDkSf2pV
U0gyy4dWTsXKKw==
=YbCo
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for performance monitoring on X86:
- Add recursion protection to another callchain invoked from
x86_pmu_stop() which can recurse back into x86_pmu_stop(). The
first attempt to fix this missed this extra code path.
- Use the already filtered status variable to check for PEBS counter
overflow bits and not the unfiltered full status read from
IA32_PERF_GLOBAL_STATUS which can have unrelated bits check which
would be evaluated incorrectly"
* tag 'perf-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Check PEBS status correctly
perf/x86/intel: Fix a warning on x86_pmu_stop() with large PEBS
- Make multiqueue devices which use the managed interrupt affinity
infrastructure work on PowerPC/Pseries. PowerPC does not use the
generic infrastructure for setting up PCI/MSI interrupts and the
multiqueue changes failed to update the legacy PCI/MSI infrastructure.
Make this work by passing the affinity setup information down to the
mapping and allocation functions.
- Move Jason Cooper from MAINTAINERS to CREDITS as his mail is bouncing
and he's not reachable. We hope all is well with him and say thanks
for his work over the years.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M1GwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoW8xD/4uG/0ayYgSdRf4nXcyXu4JKoHV5oK5
y7IWY9s04fqTFbVO2fRaD1hBYHavWfdV80obP8dJio1g6R1BqzZiEVUmCdWI0tHJ
recAsGYxqPrNj9soHEZ7ZmuGX6VhuzQj57srU+lhzsqk+88uY/n1d/TlrHCH7miU
0cfBSoolP2l2p6UYHvXfH2wk1hRHg8sySOfxGSp6KSrewoOwAOT2CCNX8gIcmy1n
dUsJaHEFzU547p55zDs5DTHfM0yJdsqqUpdxvpiZWpZhsIzoQvd8taiH7/uaRGqd
yJI4sMWudJUGGas2Vq0yjG6L0uAJ7M+kjqodJzn0hAKq6MhAIKaPMEbpPx9TuZYb
zZg9ce5o4LwzTphNPmcEMCjpPKRGNiEbcl1XY4qhQWnBvuOb1mIBFV4+6srd+Lpg
o7kEt+XyjKZARgw01yDf9tHSJYOcBQuHqGUdRZQAWSCThizpQsOZJaUWB8l4mLDy
fScYx4cH12oPmCg3Fdd22oq7JN0ed9O3M7BLuzmI006uSWsB8fbfEcM+k+g+63Go
xpHYKM6VOzLlspFPFvVo3nwzvc787he2I9tIOPtCU0Hl4BBjjfGyB9FcnhNShNoe
dfVVzTaEWmHNGonTI61suwZJeyOTudzHqbgw5rAmqmJeV5R8gFSiGmzINlqoWZX6
TWDVz4Y1ma8QwA==
=/DKJ
-----END PGP SIGNATURE-----
Merge tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of updates for the interrupt subsystem:
- Make multiqueue devices which use the managed interrupt affinity
infrastructure work on PowerPC/Pseries. PowerPC does not use the
generic infrastructure for setting up PCI/MSI interrupts and the
multiqueue changes failed to update the legacy PCI/MSI
infrastructure. Make this work by passing the affinity setup
information down to the mapping and allocation functions.
- Move Jason Cooper from MAINTAINERS to CREDITS as his mail is
bouncing and he's not reachable. We hope all is well with him and
say thanks for his work over the years"
* tag 'irq-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
powerpc/pseries: Pass MSI affinity to irq_create_mapping()
genirq/irqdomain: Add an irq_create_mapping_affinity() function
MAINTAINERS: Move Jason Cooper to CREDITS
a CONFIG dependency and broke the build for certain configurations.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/M1OYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoS36EACmVnTsnsgoovEuYHEXBhBrtZMwnau/
4PerPvX3xDXYQ284/DrKhiJQnELAPpk0k3oMDk72T5tvkHNhHdGsXZ8fGIX+QKb+
oqhdwCYrCGCZuKDtWUz0k/r/XuqAs7aoGxI5nxoJDrw5NW68MU17nb+s4LnBcxzC
IdgY1SySXOeribsiRrEHjn9B5oZ+muvS8+ohfilmGwvW8nkACqgFh64GvXxQWC9J
IZi54zXTpsqHmgM6QldA/Z5ivPEXrBRn8iorZZn2dguVNUWTZjBfYbJ+NEPrS9Z9
k+kbX8a3PazZEC0c+XzApbUkega4DcK5J0cie8inwjRW5xe/orr3sL8gncynxDyf
179H/kcUfqpxHPG3Bwe207O1aCJUo9gKR74R0a94Lws+2uDhkMtUXI3tTkkI+vnz
tcY2ukj4ccicN3KGVL77BHmGs+wG+d37rrGJ9uCLDnPZjFPisXV/UBwaYDwMhTNA
8vGdDATPacvHVU63wYFijfnZEMQDVP6mu0aAaUVE3iF9ok9LvdAXpvy/h025eYTS
68nNisEqQt0to5ji6uFZK+iEjSrmbm6F2K7PVuplAkp6DyRcL/96zHaSZSPK6O8F
EDrbBr2aNlg2hHsR1KDiHl0u84oXpQ9Fr0SBNy/FeZoWzoqs6fAKiSrShtZ8007u
3RWtPy9SbKacNw==
=uXsp
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull intel_idle build fix from Thomas Gleixner:
"A tiny build fix for a recent change in the intel_idle driver which
missed a CONFIG dependency and broke the build for certain
configurations"
* tag 'locking-urgent-2020-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Build fix
- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a preprocessor
option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl/MzyMVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGKJ8P/2kLq296XAPjqC90/LWMja8dsXO/
Wgaq8zC819x0JFuGdBKlwlFe3AvFYRtts9V5+mzjxvsOjH/6+xzyrXjRPCwZYqlj
XKC3ZwuS2SGDPFCriI1edwTUp5tyDnG/VBjqbf3ybQnz0LAShidXBD9IlM/XX9Rz
BlWqd7Uib50Pq8AfM2JVokrSmkkvhqxocIsmjTa0wvRjRAw7+aVkGNCWXqnTho7y
YuHmTWbmUQIROF3Bzs1fkGp+qaQofPRfA1tTwaTVvgmt8rEqyzXi11y6kj56INfg
/pq4O1KrplKtJFdrcjj4/eptqHG3I+Jq56qCHVescF6+bH6cc6BUL8qDdAzFZQai
e/pWCzREqFDKchEmT2d0Uzik8Zfxi5Cw68Otpzb4LqTUUxXSoRx1R9Of/Ei5QZum
6b6s9Q41UwH983UQCOOSGjXGZYP6fZG1a0XejbduYo7TL4KEECAO/FlLBWGttYH3
0i3aKz3aDKb/fo7hDbbqg+o6F0mShEraqxMmWgIvgGt+k76j0O0wS2KryqpTd7Vv
xg72suGM7f9QBA50lZ0r32fm86XnlqwQAm9ZMaSXR1Ii7j4F9UNRmR/FUYq7dPwa
COkuHr+9LqzV/tkluWi2rjLIGPaCuEVeSCcQ/wIDdp2iOyb54CbozwK0Yi2dxxus
jVFKwSaMUDHrkSj6
=/ysh
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a
preprocessor option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
* tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: avoid split lines in .mod files
kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
kbuild: Hoist '--orphan-handling' into Kconfig
Kbuild: do not emit debug info for assembly with LLVM_IAS=1
kbuild: use -fmacro-prefix-map for .S sources
Makefile.extrawarn: move -Wcast-align to W=3
Merge misc fixes from Andrew Morton:
"12 patches.
Subsystems affected by this patch series: mm (memcg, zsmalloc, swap,
mailmap, selftests, pagecache, hugetlb, pagemap), lib, and coredump"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/mmap.c: fix mmap return value when vma is merged after call_mmap()
hugetlb_cgroup: fix offline of hugetlb cgroup with reservations
mm/filemap: add static for function __add_to_page_cache_locked
userfaultfd: selftests: fix SIGSEGV if huge mmap fails
tools/testing/selftests/vm: fix build error
mailmap: add two more addresses of Uwe Kleine-König
mm/swapfile: do not sleep with a spin lock held
mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING
mm: list_lru: set shrinker map bit when child nr_items is not zero
mm: memcg/slab: fix obj_cgroup_charge() return value handling
coredump: fix core_pattern parse error
zlib: export S390 symbols for zlib modules
On success, mmap should return the begin address of newly mapped area,
but patch "mm: mmap: merge vma after call_mmap() if possible" set
vm_start of newly merged vma to return value addr. Users of mmap will
get wrong address if vma is merged after call_mmap(). We fix this by
moving the assignment to addr before merging vma.
We have a driver which changes vm_flags, and this bug is found by our
testcases.
Fixes: d70cec8983 ("mm: mmap: merge vma after call_mmap() if possible")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Link: https://lkml.kernel.org/r/20201203085350.22624-1-liuzixian4@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Moreno was ruuning a kubernetes 1.19 + containerd/docker workload
using hugetlbfs. In this environment the issue is reproduced by:
- Start a simple pod that uses the recently added HugePages medium
feature (pod yaml attached)
- Start a DPDK app. It doesn't need to run successfully (as in transfer
packets) nor interact with real hardware. It seems just initializing
the EAL layer (which handles hugepage reservation and locking) is
enough to trigger the issue
- Delete the Pod (or let it "Complete").
This would result in a kworker thread going into a tight loop (top output):
1425 root 20 0 0 0 0 R 99.7 0.0 5:22.45 kworker/28:7+cgroup_destroy
'perf top -g' reports:
- 63.28% 0.01% [kernel] [k] worker_thread
- 49.97% worker_thread
- 52.64% process_one_work
- 62.08% css_killed_work_fn
- hugetlb_cgroup_css_offline
41.52% _raw_spin_lock
- 2.82% _cond_resched
rcu_all_qs
2.66% PageHuge
- 0.57% schedule
- 0.57% __schedule
We are spinning in the do-while loop in hugetlb_cgroup_css_offline.
Worse yet, we are holding the master cgroup lock (cgroup_mutex) while
infinitely spinning. Little else can be done on the system as the
cgroup_mutex can not be acquired.
Do note that the issue can be reproduced by simply offlining a hugetlb
cgroup containing pages with reservation counts.
The loop in hugetlb_cgroup_css_offline is moving page counts from the
cgroup being offlined to the parent cgroup. This is done for each
hstate, and is repeated until hugetlb_cgroup_have_usage returns false.
The routine moving counts (hugetlb_cgroup_move_parent) is only moving
'usage' counts. The routine hugetlb_cgroup_have_usage is checking for
both 'usage' and 'reservation' counts. Discussion about what to do with
reservation counts when reparenting was discussed here:
https://lore.kernel.org/linux-kselftest/CAHS8izMFAYTgxym-Hzb_JmkTK1N_S9tGN71uS6MFV+R7swYu5A@mail.gmail.com/
The decision was made to leave a zombie cgroup for with reservation
counts. Unfortunately, the code checking reservation counts was
incorrectly added to hugetlb_cgroup_have_usage.
To fix the issue, simply remove the check for reservation counts. While
fixing this issue, a related bug in hugetlb_cgroup_css_offline was
noticed. The hstate index is not reinitialized each time through the
do-while loop. Fix this as well.
Fixes: 1adc4d419a ("hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201203220242.158165-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The error handling in hugetlb_allocate_area() was incorrect for the
hugetlb_shared test case.
Previously the behavior was:
- mmap a hugetlb area
- If this fails, set the pointer to NULL, and carry on
- mmap an alias of the same hugetlb fd
- If this fails, munmap the original area
If the original mmap failed, it's likely the second one did too. If
both failed, we'd blindly try to munmap a NULL pointer, causing a
SIGSEGV. Instead, "goto fail" so we return before trying to mmap the
alias.
This issue can be hit "in real life" by forgetting to set
/proc/sys/vm/nr_hugepages (leaving it at 0), and then trying to run the
hugetlb_shared test.
Another small improvement is, when the original mmap fails, don't just
print "it failed": perror(), so we can see *why*. :)
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Alan Gilbert <dgilbert@redhat.com>
Link: https://lkml.kernel.org/r/20201204203443.2714693-1-axelrasmussen@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only x86 and PowerPC implement the pkey-xxx.h, and an error was reported
when compiling protection_keys.c.
Add a Arch judgment to compile "protection_keys" in the Makefile.
If other arch implement this, add the arch name to the Makefile.
eg:
ifneq (,$(findstring $(ARCH),powerpc mips ... ))
Following build errors:
pkey-helpers.h:93:2: error: #error Architecture not supported
#error Architecture not supported
pkey-helpers.h:96:20: error: `PKEY_DISABLE_ACCESS' undeclared
#define PKEY_MASK (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)
^
protection_keys.c:218:45: error: `PKEY_DISABLE_WRITE' undeclared
pkey_assert(flags & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE));
^
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Link: https://lkml.kernel.org/r/1606826876-30656-1-git-send-email-suxingxing@loongson.cn
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes attribution for the commits (among others)
- d4097456cd ("video/framebuffer: move the probe func into
.devinit.text in Blackfin LCD driver")
- 0312e024d6 ("mfd: mc13xxx: Add support for mc34708")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201127213358.3440830-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can't call kvfree() with a spin lock held, so defer it. Fixes a
might_sleep() runtime warning.
Fixes: 873d7bcfd0 ("mm/swapfile.c: use kvzalloc for swap_info_struct allocation")
Signed-off-by: Qian Cai <qcai@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201202151549.10350-1-qcai@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While I was doing zram testing, I found sometimes decompression failed
since the compression buffer was corrupted. With investigation, I found
below commit calls cond_resched unconditionally so it could make a
problem in atomic context if the task is reschedule.
BUG: sleeping function called from invalid context at mm/vmalloc.c:108
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog
3 locks held by memhog/946:
#0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160
#1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160
#2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0
CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
unmap_kernel_range_noflush+0x2eb/0x350
unmap_kernel_range+0x14/0x30
zs_unmap_object+0xd5/0xe0
zram_bvec_rw.isra.0+0x38c/0x8e0
zram_rw_page+0x90/0x101
bdev_write_page+0x92/0xe0
__swap_writepage+0x94/0x4a0
pageout+0xe3/0x3a0
shrink_page_list+0xb94/0xd60
shrink_inactive_list+0x158/0x460
We can fix this by removing the ZSMALLOC_PGTABLE_MAPPING feature (which
contains the offending calling code) from zsmalloc.
Even though this option showed some amount improvement(e.g., 30%) in
some arm32 platforms, it has been headache to maintain since it have
abused APIs[1](e.g., unmap_kernel_range in atomic context).
Since we are approaching to deprecate 32bit machines and already made
the config option available for only builtin build since v5.8, lastly it
has been not default option in zsmalloc, it's time to drop the option
for better maintenance.
[1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org
Fixes: e47110e905 ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Harish Sriram <harish@linux.ibm.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201117202916.GA3856507@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When investigating a slab cache bloat problem, significant amount of
negative dentry cache was seen, but confusingly they neither got shrunk
by reclaimer (the host has very tight memory) nor be shrunk by dropping
cache. The vmcore shows there are over 14M negative dentry objects on
lru, but tracing result shows they were even not scanned at all.
Further investigation shows the memcg's vfs shrinker_map bit is not set.
So the reclaimer or dropping cache just skip calling vfs shrinker. So
we have to reboot the hosts to get the memory back.
I didn't manage to come up with a reproducer in test environment, and
the problem can't be reproduced after rebooting. But it seems there is
race between shrinker map bit clear and reparenting by code inspection.
The hypothesis is elaborated as below.
The memcg hierarchy on our production environment looks like:
root
/ \
system user
The main workloads are running under user slice's children, and it
creates and removes memcg frequently. So reparenting happens very often
under user slice, but no task is under user slice directly.
So with the frequent reparenting and tight memory pressure, the below
hypothetical race condition may happen:
CPU A CPU B
reparent
dst->nr_items == 0
shrinker:
total_objects == 0
add src->nr_items to dst
set_bit
return SHRINK_EMPTY
clear_bit
child memcg offline
replace child's kmemcg_id with
parent's (in memcg_offline_kmem())
list_lru_del() between shrinker runs
see parent's kmemcg_id
dec dst->nr_items
reparent again
dst->nr_items may go negative
due to concurrent list_lru_del()
The second run of shrinker:
read nr_items without any
synchronization, so it may
see intermediate negative
nr_items then total_objects
may return 0 coincidently
keep the bit cleared
dst->nr_items != 0
skip set_bit
add scr->nr_item to dst
After this point dst->nr_item may never go zero, so reparenting will not
set shrinker_map bit anymore. And since there is no task under user
slice directly, so no new object will be added to its lru to set the
shrinker map bit either. That bit is kept cleared forever.
How does list_lru_del() race with reparenting? It is because reparenting
replaces children's kmemcg_id to parent's without protecting from
nlru->lock, so list_lru_del() may see parent's kmemcg_id but actually
deleting items from child's lru, but dec'ing parent's nr_items, so the
parent's nr_items may go negative as commit 2788cf0c40 ("memcg:
reparent list_lrus and free kmemcg_id on css offline") says.
Since it is impossible that dst->nr_items goes negative and
src->nr_items goes zero at the same time, so it seems we could set the
shrinker map bit iff src->nr_items != 0. We could synchronize
list_lru_count_one() and reparenting with nlru->lock, but it seems
checking src->nr_items in reparenting is the simplest and avoids lock
contention.
Fixes: fae91d6d8b ("mm/list_lru.c: set bit in memcg shrinker bitmap on first list_lru item appearance")
Suggested-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org> [4.19]
Link: https://lkml.kernel.org/r/20201202171749.264354-1-shy828301@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 10befea91b ("mm: memcg/slab: use a single set of kmem_caches
for all allocations") introduced a regression into the handling of the
obj_cgroup_charge() return value. If a non-zero value is returned
(indicating of exceeding one of memory.max limits), the allocation
should fail, instead of falling back to non-accounted mode.
To make the code more readable, move memcg_slab_pre_alloc_hook() and
memcg_slab_post_alloc_hook() calling conditions into bodies of these
hooks.
Fixes: 10befea91b ("mm: memcg/slab: use a single set of kmem_caches for all allocations")
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201127161828.GD840171@carbon.dhcp.thefacebook.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
'format_corename()' will splite 'core_pattern' on spaces when it is in
pipe mode, and take helper_argv[0] as the path to usermode executable.
It works fine in most cases.
However, if there is a space between '|' and '/file/path', such as
'| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will
be parsed as '', and users will get a 'Core dump to | disabled'.
It is not friendly to users, as the pattern above was valid previously.
Fix this by ignoring the spaces between '|' and '/file/path'.
Fixes: 315c69261d ("coredump: split pipe command whitespace before expanding template")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Wise <pabs3@bonedaddy.net>
Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"xargs echo" is not a safe way to remove line breaks because the input
may exceed the command line limit and xargs may break it up into
multiple invocations of echo. This should never happen because
scripts/gen_autoksyms.sh expects all undefined symbols are placed in
the second line of .mod files.
One possible way is to replace "xargs echo" with
"sed ':x;N;$!bx;s/\n/ /g'" or something, but I rewrote the code by
using awk because it is more readable.
This issue was reported by Sami Tolvanen; in his Clang LTO patch set,
$(multi-used-m) is no longer an ELF object, but a thin archive that
contains LLVM bitcode files. llvm-nm prints out symbols for each
archive member separately, which results a lot of dupications, in some
places, beyond the system-defined limit.
This problem must be fixed irrespective of LTO, and we must ensure
zero possibility of having this issue.
Link: https://lkml.org/lkml/2020/12/1/1658
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
This reverts commit d162219c65.
The device uses a VIRTIO device ID out of a not-for-production range.
Releasing Linux using an ID out of this range will make it conflict with
development setups. An official request to reserve an ID for an MEI
device is yet to be submitted to the virtio TC, thus there's no chance
it will be reserved and fixed in time before the next release.
Once requested it usually takes 2-3 weeks to land in the spec, which
means the device can be supported with the official ID in the next Linux
version if contributors act quickly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Wang Yu <yu1.wang@intel.com>
Cc: Liu Shuo <shuo.a.liu@intel.com>
Link: https://lore.kernel.org/r/20201205193625.469773-1-mst@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper
check must be:
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 25189d08e5 ("x86/sev-es: Add support for handling IOIO exceptions")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/160697106089.3146288.2052422845039649176.stgit@devnote2
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes. Use the new
for_each_insn_prefix() macro which does it correctly.
Debugged by Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message. ]
Fixes: 32d0b95300 ("x86/insn-eval: Add utility functions to get segment selector")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697104969.3146288.16329307586428270032.stgit@devnote2
Since insn.prefixes.nbytes can be bigger than the size of
insn.prefixes.bytes[] when a prefix is repeated, the proper check must
be
insn.prefixes.bytes[i] != 0 and i < 4
instead of using insn.prefixes.nbytes.
Introduce a for_each_insn_prefix() macro for this purpose. Debugged by
Kees Cook <keescook@chromium.org>.
[ bp: Massage commit message, sync with the respective header in tools/
and drop "we". ]
Fixes: 2b14449835 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
Reported-by: syzbot+9b64b619f10f19d19a7c@syzkaller.appspotmail.com
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/160697103739.3146288.7437620795200799020.stgit@devnote2
Pull input fixes from Dmitry Torokhov:
"A fix for 'RETRIGEN' handling in Atmel touch controllers that was
causing lost interrupts on systems using edge-triggered interrupts, a
quirk for i8042 driver, and a couple more fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: atmel_mxt_ts - fix lost interrupts
Input: xpad - support Ardwiino Controllers
Input: i8042 - add ByteSpeed touchpad to noloop table
Input: i8042 - fix error return code in i8042_setup_aux()
Input: soc_button_array - add missing include
Pull i2c fixes from Wolfram Sang:
"Some more I2C driver updates. IMX updates are a tad bigger, but not
exceptionally big"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mlxbf: Fix the return check of devm_ioremap and ioremap
i2c: mlxbf: select CONFIG_I2C_SLAVE
i2c: imx: Don't generate STOP condition if arbitration has been lost
i2c: imx: Check for I2SR_IAL after every byte
i2c: imx: Fix reset of I2SR_IAL flag
i2c: qcom: Fix IRQ error misassignement
i2c: qup: Fix error return code in qup_i2c_bam_schedule_desc()