A coccicheck run provided information like the following.
drivers/gpu/drm/qxl/qxl_kms.c:295:1-7: ERROR: missing iounmap;
ioremap on line 178 and execution via conditional on line 185
Generated by: scripts/coccinelle/free/iounmap.cocci
A jump target was specified in an if branch. The corresponding function
call did not release the desired system resource then.
Thus use the label “rom_unmap” instead to fix the exception handling
for this function implementation.
Fixes: 5043348a49 ("drm: qxl: Fix error handling at qxl_device_init")
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Link: http://patchwork.freedesktop.org/patch/msgid/5e5ef9c4-4d85-3c93-cf28-42cfcb5b0649@web.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Implement Wa_1604555607 (set the DS pairing timer to 128 cycles).
FF_MODE2 is part of the register state context, that's why it is
implemented here.
At TGL A0 stepping, FF_MODE2 register read back is broken, hence
disabling the WA verification.
v2: Rebased on top of the WA refactoring (Oscar)
v3: Correctly add to ctx_workarounds_init (Michel)
v4:
uncore read is used [Tvrtko]
Macros as used for MASK definition [Chris]
v5:
Skip the Wa_1604555607 verification [Ram]
i915 ptr retrieved from engine. [Tvrtko]
v6:
Added wa_add as a wrapper for __wa_add [Chris]
wa_add is directly called instead of new wrapper [tvrtko]
BSpec: 19363
HSDES: 1604555607
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Ramalingam C <ramlingam.c@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v5]
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128021005.3350-1-ramalingam.c@intel.com
One does not lightly add a new hidden struct_mutex dependency deep within
the execbuf bowels! The immediate suspicion in seeing the whitelist
cached on the context, is that it is intended to be preserved between
batches, as the kernel is quite adept at caching small allocations
itself. But no, it's sole purpose is to serialise command submission in
order to save a kmalloc on a slow, slow path!
By removing the whitelist dependency from the context, our freedom to
chop the big struct_mutex is greatly augmented.
v2: s/set_bit/__set_bit/ as the whitelist shall never be accessed
concurrently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128113424.3885958-1-chris@chris-wilson.co.uk
Graphics APIs like OpenGL 4.4 and Vulkan require the graphics driver
to provide coherent graphics memory, meaning that the GPU sees any
content written to the coherent memory on the next GPU operation that
touches that memory, and the CPU sees any content written by the GPU
to that memory immediately after any fence object trailing the GPU
operation is signaled.
Paravirtual drivers that otherwise require explicit synchronization
needs to do this by hooking up dirty tracking to pagefault handlers
and buffer object validation.
Provide mm helpers needed for this and that also allow for huge pmd-
and pud entries (patch 1-3), and the associated vmwgfx code (patch 4-7).
The code has been tested and exercised by a tailored version of mesa
where we disable all explicit synchronization and assume graphics memory
is coherent. The performance loss varies of course; a typical number is
around 5%.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas_os@shipmail.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191113131639.4653-1-thomas_os@shipmail.org
Pull drm updates from Dave Airlie:
"Lots of stuff in here, though it hasn't been too insane this merge
apart from dealing with the security fun.
uapi:
- export different colorspace properties on DP vs HDMI
- new fourcc for ARM 16x16 block format
- syncobj: allow querying last submitted timeline value
- DRM_FORMAT_BIG_ENDIAN defined as unsigned
core:
- allow using gem vma manager in ttm
- connector/encoder/bridge doc fixes
- allow more than 3 encoders for a connector
- displayport mst suspend/resume reprobing support
- vram lazy unmapping, uniform vram mm and gem vram
- edid cleanups + AVI informframe bar info
- displayport helpers - dpcd parser added
dp_cec:
- Allow a connector to be associated with a cec device
ttm:
- pipelining with no_gpu_wait fix
- always keep BOs on the LRU
sched:
- allow free_job routine to sleep
i915:
- Block userptr from mappable GTT
- i915 perf uapi versioning
- OA stream dynamic reconfiguration
- make context persistence optional
- introduce DRM_I915_UNSTABLE Kconfig
- add fake lmem testing under unstable
- BT.2020 support for DP MSA
- struct mutex elimination
- Tigerlake display/PLL/power management improvements
- Jasper Lake PCH support
- refactor PMU for multiple GPUs
- Icelake firmware update
- Split out vga + switcheroo code
amdgpu:
- implement dma-buf import/export without helpers
- vega20 RAS enablement
- DC i2c over aux fixes
- renoir GPU reset
- DC HDCP support
- BACO support for CI/VI asics
- MSI-X support
- Arcturus EEPROM support
- Arcturus VCN encode support
- VCN dynamic powergating on RV/RV2
amdkfd:
- add navi12/14/renoir support to kfd
radeon:
- SI dpm fix ported from amdgpu
- fix bad DMA on ppc platforms
gma500:
- memory leak fixes
qxl:
- convert to new gem mmap
exynos:
- build warning fix
komeda:
- add aclk sysfs attribute
v3d:
- userspace cleanup uapi change
i810:
- fix for underflow in dispatch ioctls
ast:
- refactor show_cursor
mgag200:
- refactor show_cursor
arcgpu:
- encoder finding improvements
mediatek:
- mipi_tx, dsi and partial crtc support for MT8183 SoC
- rotation support
meson:
- add suspend/resume support
omap:
- misc refactors
tegra:
- DisplayPort support for Tegra 210, 186 and 194.
- IOMMU-backed DMA API fixes
panfrost:
- fix lockdep issue
- simplify devfreq integration
rcar-du:
- R8A774B1 SoC support
- fixes for H2 ES2.0
sun4i:
- vcc-dsi regulator support
virtio-gpu:
- vmexit vs spinlock fix
- move to gem shmem helpers
- handle large command buffers with cma"
* tag 'drm-next-2019-11-27' of git://anongit.freedesktop.org/drm/drm: (1855 commits)
drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10
drm/amdgpu: initialize vm_inv_eng0_sem for gfxhub and mmhub
drm/amd/amdgpu/sriov skip RLCG s/r list for arcturus VF.
drm/amd/amdgpu/sriov temporarily skip ras,dtm,hdcp for arcturus VF
drm/amdgpu/gfx10: re-init clear state buffer after gpu reset
merge fix for "ftrace: Rework event_create_dir()"
drm/amdgpu: Update Arcturus golden registers
drm/amdgpu/gfx10: fix out-of-bound mqd_backup array access
drm/amdgpu/gfx10: explicitly wait for cp idle after halt/unhalt
Revert "drm/amd/display: enable S/G for RAVEN chip"
drm/amdgpu: disable gfxoff on original raven
drm/amdgpu: remove experimental flag for Navi14
drm/amdgpu: disable gfxoff when using register read interface
drm/amdgpu/powerplay: properly set PP_GFXOFF_MASK (v2)
drm/amdgpu: fix bad DMA from INTERRUPT_CNTL2
drm/radeon: fix bad DMA from INTERRUPT_CNTL2
drm/amd/display: Fix debugfs on MST connectors
drm/amdgpu/nv: add asic func for fetching vbios from rom directly
drm/amdgpu: put flush_delayed_work at first
drm/amdgpu/vcn2.5: fix the enc loop with hw fini
...
The design of our interrupt handlers is that we ack the receipt of the
interrupt first, inside the critical section where the master interrupt
control is off and other cpus cannot start processing the next
interrupt; and then process the interrupt events afterwards. However,
Icelake introduced a whole new set of banked GT_IIR that are inherently
serialised and slow to retrieve the IIR and must be processed within the
critical section. We can still push our breadcrumbs out of this critical
section by using our irq_worker. On bdw+, this should not make too much
of a difference as we only slightly defer the breadcrumbs, but on icl+
this should make a big difference to our throughput of interrupts from
concurrently executing engines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127115813.3345823-1-chris@chris-wilson.co.uk
The expected downside to commit 58b4c1a07a ("drm/i915: Reduce nested
prepare_remote_context() to a trylock") was that it would need to return
-EAGAIN to userspace in order to resolve potential mutex inversion. Such
an unsightly round trip is unnecessary if we could atomically insert a
barrier into the i915_active_fence, so make it happen.
Currently, we use the timeline->mutex (or some other named outer lock)
to order insertion into the i915_active_fence (and so individual nodes
of i915_active). Inside __i915_active_fence_set, we only need then
serialise with the interrupt handler in order to claim the timeline for
ourselves.
However, if we remove the outer lock, we need to ensure the order is
intact between not only multiple threads trying to insert themselves
into the timeline, but also with the interrupt handler completing the
previous occupant. We use xchg() on insert so that we have an ordered
sequence of insertions (and each caller knows the previous fence on
which to wait, preserving the chain of all fences in the timeline), but
we then have to cmpxchg() in the interrupt handler to avoid overwriting
the new occupant. The only nasty side-effect is having to temporarily
strip off the RCU-annotations to apply the atomic operations, otherwise
the rules are much more conventional!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402
Fixes: 58b4c1a07a ("drm/i915: Reduce nested prepare_remote_context() to a trylock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
Problem:
Due to a race between drm_sched_cleanup_jobs in sched thread and
drm_sched_job_timedout in timeout work there is a possiblity that
bad job was already freed while still being accessed from the
timeout thread.
Fix:
Instead of just peeking at the bad job in the mirror list
remove it from the list under lock and then put it back later when
we are garanteed no race with main sched thread is possible which
is after the thread is parked.
v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.
v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
drm_sched_get_cleanup_job already has a lock there.
v4: Fix comments to relfect latest code in drm-misc.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emily Deng <Emily.Deng@amd.com>
Tested-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/342356
The BitField of non privilege register address is only from bit 2 to 25.
v2: use REG_GENMASK instead. (Zhenyu)
Signed-off-by: Gao, Fred <fred.gao@intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Pull sound updates from Takashi Iwai:
"There have been some significant changes in the core side, both for
ALSA and ASoC, while lots of development have been seen in SOF, as
well as many small fixes/improvements for ASoC codecs and platforms.
Below is a highlight in this cycle:
Core:
- The unification of PCM vmalloc buffer allocation helpers into the
standard API
- Clean up of the default PCM mmap handling for vmalloc & SG-buffer
- Fix potential races at ALSA timer open
- A few new PCM API extensions; just preliminary core changes, the
actual changes in drivers will be merged in 5.6
- Continued ASoC componentization works; now almost everything is a
common ASoC component object. A lot of refactoring and
simplification have been done along with it.
ASoC:
- Many fixes to the Sound Open Firmware (SOF) code
- Wake on voice support for Chromebooks
- SPI support and trigger word detection for RT5677
- New drivers for Analog Devices ADAU7118, Intel Cannonlake systems
with RT1011 and RT5682, Texas Instruments TAS2562 and TAS2770
HD-audio:
- Improved Intel DSP configuration / probe code for SOF
- Plumbing the legacy HD-audio driver with Intel SOF HDMI
- DP-MST support for Nvidia HDMI codecs
- Realtek quirks cleanups and new additions as usual
Others:
- Lots of refactoring and cleanups for FireWire; period-size sharing,
h/w IRQ interval configuration, clock recovery improvements, etc
- USB-audio: Scarlett mixer quirks
- Cleanups of PCM calls in various drivers (including media and USB)
to adapt the core API changes"
* tag 'sound-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (497 commits)
ALSA: usb-audio: Fix Focusrite Scarlett 6i6 gen1 - input handling
ALSA: hda/realtek - Enable internal speaker of ASUS UX431FLC
ALSA: aloop: Fix dependency on timer API
ASoC: DMI long name - avoid to add board name if matches with product name
ASoC: improve the DMI long card code in asoc-core
ASoC: rsnd: fix DALIGN register for SSIU
ALSA: aloop: Avoid unexpected timer event callback tasklets
ALSA: aloop: Remove redundant locking in timer open function
ASoC: component: Add sync_stop PCM ops
ASoC: pcm: Make ioctl ops optional
ALSA: hda/hdmi - Clear codec->relaxed_resume flag at unbinding
ALSA: hda - Disable audio component for legacy Nvidia HDMI codecs
ALSA: cs4236: fix error return comparison of an unsigned integer
ALSA: usb-audio: Fix NULL dereference at parsing BADD
ALSA: usb-audio: Fix Scarlett 6i6 Gen 2 port data
ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop
ALSA: hda/realtek - Move some alc236 pintbls to fallback table
ALSA: hda/realtek - Move some alc256 pintbls to fallback table
ALSA: docs: Update about the new PCM sync_stop ops
ALSA: pcm: Add card sync_irq field
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- A comprehensive rewrite of the robust/PI futex code's exit handling
to fix various exit races. (Thomas Gleixner et al)
- Rework the generic REFCOUNT_FULL implementation using
atomic_fetch_* operations so that the performance impact of the
cmpxchg() loops is mitigated for common refcount operations.
With these performance improvements the generic implementation of
refcount_t should be good enough for everybody - and this got
confirmed by performance testing, so remove ARCH_HAS_REFCOUNT and
REFCOUNT_FULL entirely, leaving the generic implementation enabled
unconditionally. (Will Deacon)
- Other misc changes, fixes, cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
lkdtm: Remove references to CONFIG_REFCOUNT_FULL
locking/refcount: Remove unused 'refcount_error_report()' function
locking/refcount: Consolidate implementations of refcount_t
locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
locking/refcount: Move saturation warnings out of line
locking/refcount: Improve performance of generic REFCOUNT_FULL code
locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the <linux/refcount.h> header
locking/refcount: Remove unused refcount_*_checked() variants
locking/refcount: Ensure integer operands are treated as signed
locking/refcount: Define constants for saturation and max refcount values
futex: Prevent exit livelock
futex: Provide distinct return value when owner is exiting
futex: Add mutex around futex exit
futex: Provide state handling for exec() as well
futex: Sanitize exit state handling
futex: Mark the begin of futex exit explicitly
futex: Set task::futex_state to DEAD right after handling futex exit
futex: Split futex_mm_release() for exit/exec
exit/exec: Seperate mm_release()
futex: Replace PF_EXITPIDONE with a state
...
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- Dynamic tick (nohz) updates, perhaps most notably changes to force
the tick on when needed due to lengthy in-kernel execution on CPUs
on which RCU is waiting.
- Linux-kernel memory consistency model updates.
- Replace rcu_swap_protected() with rcu_prepace_pointer().
- Torture-test updates.
- Documentation updates.
- Miscellaneous fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
rcu: Suppress levelspread uninitialized messages
rcu: Fix uninitialized variable in nocb_gp_wait()
rcu: Update descriptions for rcu_future_grace_period tracepoint
rcu: Update descriptions for rcu_nocb_wake tracepoint
rcu: Remove obsolete descriptions for rcu_barrier tracepoint
rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
workqueue: Convert for_each_wq to use built-in list check
rcu: Several rcu_segcblist functions can be static
rcu: Remove unused function hlist_bl_del_init_rcu()
Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
...
Originally the last WREG32_SOC15() was a part of the if statement block
but the curly braces are on the wrong line.
Fixes: bb0db70f3f ("drm/amdgpu: separate JPEG1.0 code out from VCN1.0")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The documentation says the that PCI core handles this
for you unless you choose to implement it. Just rely
on the PCI core to handle the pci specific bits.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
This line of code was modified. However, comments
remained unchanged. As a result, comments and code are
mismatching.
[How]
Modifying comments to reflect code. At the same time,
explaining why the value was changed from 200ms to
3000ms.
Signed-off-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently ARCTURUS VF doesn't support jpeg ip block.
Skip jpeg ip block in case guest driver load fail.
Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: Zhexi Zhang <zhexi.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Be less pessimistic about estimated page table use for KFD. Most
allocations use 2MB pages and therefore need less VRAM for page
tables. This allows more VRAM to be used for applications especially
on large systems with many GPUs and hundreds of GB of system memory.
Example: 8 GPUs with 32GB VRAM each + 256GB system memory = 512GB
Old page table reservation per GPU: 1GB
New page table reservation per GPU: 32MB
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
previously event_property_validate was only called after we enabled the display.
But after "Refactor HDCP to handle multiple displays per link" this function
can be called at any time. In certain cases we don't have a aconnector
[How]
Null check aconnector and exit early. This is ok because we only need to check the
ENABLED->DESIRED transition if a connector exists.
Fixes: b1abe5586f ("drm/amd/display: Refactor HDCP to handle multiple displays per link")
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KFD has been verified to function on POWER systems (Talos II / Vega 64).
It should be available as a kernel configuration option on these systems.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Be less pessimistic about estimated page table use for KFD. Most
allocations use 2MB pages and therefore need less VRAM for page
tables. This allows more VRAM to be used for applications especially
on large systems with many GPUs and hundreds of GB of system memory.
Example: 8 GPUs with 32GB VRAM each + 256GB system memory = 512GB
Old page table reservation per GPU: 1GB
New page table reservation per GPU: 32MB
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: xinhui pan <xinhui.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>