- Adds Swappy for Android for stable frame pacing
- Implements pre-transformed Swapchain so that Godot's compositor is in
charge of rotating the screen instead of Android's compositor
(performance optimization for phones that don't have HW rotator)
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
Changes from original PR:
- Removed "display/window/frame_pacing/android/target_frame_rate" option
to use Engine::get_max_fps instead.
- Target framerate can be changed at runtime using Engine::set_max_fps.
- Swappy is enabled by default.
- Added documentation.
- enable_auto_swap setting is replaced with swappy_mode.
Adds "--accurate-breadcrumbs" CLI command
Additionally, leave out breadcrumbs code in non-debug, non-dev builds.
Fix regression introduced in #98388 where command_insert_breadcrumb() is
called even in non-debug builds.
Fixes#98338
Fix an error where barriers are expected to be inserted for the swap chain textures.
Add the relevant synchronization stages and accesses to resources between frames.
Fix an error where debug labels weren't finished correctly between frames.
Breadcrumbs are now behind an optional macro as they currently lead to synchronization errors which are harmless.
Also adds a new possible texture layout and API trait to support a particular behavior in D3D12 where only the COMMON layout is supported in copy queues. Fixes#98158.
These messages were printed every time the swapchain was recreated
(e.g. on viewport size change), which could easily end up spamming
the output.
The chosen present mode is already displayed when using the Print FPS
project setting or command line argument.
- Implements asynchronous transfer queues from PR #87590.
- Adds ubershaders that can run with specialization constants specified as push constants.
- Pipelines with specialization constants can compile in the background.
- Added monitoring for pipeline compilations.
- Materials and shaders can now be created asynchronously on background threads.
- Meshes that are loaded on background threads can also compile pipelines as part of the loading process.
PR #90993 added several debugging utilities.
Among them, advanced memory tracking through the use of custom
allocators and VK_EXT_device_memory_report.
However as issue #95967 reveals, it is dangerous to leave it on by
default because drivers (or even the Vulkan loader) can too easily
accidentally break custom allocators by allocating memory through std
malloc but then request us to deallocate it (or viceversa).
This PR fixes the following problems:
- Adds --extra-gpu-memory-tracking cmd line argument
- Adds missing enum entries to
RenderingContextDriverVulkan::VkTrackedObjectType
- Adds RenderingDevice::get_driver_and_device_memory_report
- GDScript users can easily check via print(
RenderingServer.get_rendering_device().get_driver_and_device_memory_report()
)
- Uses get_driver_and_device_memory_report on device lost for appending
further info.
Fixes#95967
Features:
- Debug-only tracking of objects by type. See
get_driver_allocs_by_object_type et al.
- Debug-only Breadcrumb info for debugging GPU crashes and device lost
- Performance report per frame from get_perf_report
- Some VMA calls had to be modified in order to insert the necessary
memory callbacks
Functionality marked as "debug-only" is only available in debug or dev
builds.
Misc fixes:
- Early break optimization in RenderingDevice::uniform_set_create
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
Enables support for enhanced barriers if available.
Gets rid of the implementation of [CROSS_FAMILY_FALLBACK] in the D3D12 driver. The logic has been reimplemented at a higher level in RenderingDevice itself.
This fallback is only used if the RenderingDeviceDriver reports the API traits and the capability of sharing texture formats correctly. Aliases created in this way can only be used for sampling: never for writing. In most cases, the formats that do not support sharing do not support unordered access/storage writes in the first place.
This gives better explanations on why the cache may have been invalidated,
along with usual consequences.
These messages have also been moved to verbose prints, as users
cannot do anything to resolve them specifically (so they are mostly
relevant to developers).
It used to warn when opening a new project because no cache pre-exists,
which isn't particularly helpful.
Also include the rendering method in the cache filename, as it differs
between Forward+ and Mobile for a same GPU.
Not everything is yet implemented, either for Godot or personal
limitations (I don't have all hardware in the world). A brief list of
the most important issues follows:
- Single-window only: the `DisplayServer` API doesn't expose enough
information for properly creating XDG shell windows.
- Very dumb rendering loop: this is very complicated, just know that
the low consumption mode is forced to 2000 Hz and some clever hacks are
in place to overcome a specific Wayland limitation. This will be
improved to the extent possible both downstream and upstream.
- Features to implement yet: IME, touch input, native file dialog,
drawing tablet (commented out due to a refactor), screen recording.
- Mouse passthrough can't be implement through a poly API, we need a
rect-based one.
- The cursor doesn't yet support fractional scaling.
- Auto scale is rounded up when using fractional scaling as we don't
have a per-window scale query API (basically we need
`DisplayServer::window_get_scale`).
- Building with `x11=no wayland=yes opengl=yes openxr=yes` fails.
This also adds a new project property and editor setting for selecting the
default DisplayServer to start, to allow this backend to start first in
exported projects (X11 is still the default for now). The editor setting
always overrides the project setting.
Special thanks to Drew Devault, toger5, Sebastian Krzyszkowiak, Leandro
Benedet Garcia, Subhransu, Yury Zhuravlev and Mara Huldra.
Adds a new system to automatically reorder commands, perform layout transitions and insert synchronization barriers based on the commands issued to RenderingDevice.
Credit and thanks to @bruzvg for multiple build fixes, update of 3rd-party items and MinGW support.
Co-authored-by: bruvzg <7645683+bruvzg@users.noreply.github.com>