PR #90993 added several debugging utilities.
Among them, advanced memory tracking through the use of custom
allocators and VK_EXT_device_memory_report.
However as issue #95967 reveals, it is dangerous to leave it on by
default because drivers (or even the Vulkan loader) can too easily
accidentally break custom allocators by allocating memory through std
malloc but then request us to deallocate it (or viceversa).
This PR fixes the following problems:
- Adds --extra-gpu-memory-tracking cmd line argument
- Adds missing enum entries to
RenderingContextDriverVulkan::VkTrackedObjectType
- Adds RenderingDevice::get_driver_and_device_memory_report
- GDScript users can easily check via print(
RenderingServer.get_rendering_device().get_driver_and_device_memory_report()
)
- Uses get_driver_and_device_memory_report on device lost for appending
further info.
Fixes#95967
- Implements the concept of GDExtension loaders that can be used to customize how GDExtensions are loaded and initialized.
- Moves the parsing of `.gdextension` config files to the new `GDExtensionLibraryLoader`.
- `GDExtensionManager` is now meant to be the main way to load/unload extensions and can optionally take a `GDExtensionLoader`.
- `EditorFileSystem` avoids unloading extensions if the file still exists, this should prevent unloading extensions that are outside the user project.
Features:
- Debug-only tracking of objects by type. See
get_driver_allocs_by_object_type et al.
- Debug-only Breadcrumb info for debugging GPU crashes and device lost
- Performance report per frame from get_perf_report
- Some VMA calls had to be modified in order to insert the necessary
memory callbacks
Functionality marked as "debug-only" is only available in debug or dev
builds.
Misc fixes:
- Early break optimization in RenderingDevice::uniform_set_create
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
1. Make handling of user tokens atomic:
Loads started with the external-facing API used to perform a two-step setup of the user token. Between both, the mutex was unlocked without its reference count having been increased. A non-user-initiated load could therefore destroy the load task when it unreferenced the token.
Those stages now happen atomically so in the one hand, the described race condition can't happen so the load task life insurance doesn't have a gap anymore and, on the other hand, the ugliness that the call to load could return `ERR_BUSY` if happening while other thread was between both steps is gone.
The code has been refactored so the user token concerns are still outside the inner load start function, which is agnostic to that for a cleaner implementation.
2. Clear ambiguity between load operations running on `WorkerThreadPool`:
The two cases are: single-loaded thread directly started by a user pool task and a load started by the system as part of a multi-threaded load.
Since ensuring all the code dealing with this distinction would make it very complex, and error-prone, a different measure is applied instead: just take one of the cases out of the dicotomy. We now ensure every load happening on a pool thread has been initiated by the system.
The way of achieving that is that a single-threaded user-started load initiated from a pool thread, is run as another task.
Benefits:
- Simpler code. The main load function is renamed so it's apparent that it's not just a thread entry point anymore.
- Cache and thread modes of the original task are honored. A beautiful consequence of this is that, unlike formerly, re-issued loads can use the resource cache, which makes this mechanism much more performant.
- The newly added getter for caller task id in WorkerThreadPool allows to remove the custom tracking of that in ResourceLoader.
- The check to replace a cached resource and the replacement itself happen atomically. That fixes deadlock prevention leading to multiple resource instances of the same one on disk. As a side effect, it also makes the regular check for replace load mode more robust.
Changes to reduce the latency between changing node selection in the editor and seeing the new node reflected in the Inspector tab
- Use Vector instead of List for ThemeOwner::get_theme_type_dependencies and related functions
- Use Vector instead of List for ThemeContext::themes, set_themes(), and get_themes()
- Add ClassDB:get_inheritance_chain_nocheck to get all parent/ancestor classes at once, to avoid repeated ClassDB locking overhead
- Update BIND_THEME_ITEM macros and ThemeDB::update_class_instance_items to use provided StringNames for call to ThemeItemSetter, instead of creating a new StringName in each call
These changes reduce the time taken by EditorInspector::update_tree by around 30-35%
- Returns an empty list when there's not registered plugins, thus preventing the creation of spurious iterator objects
- Inline `Godot#getRotatedValues(...)` given it only had a single caller. This allows to remove the allocation of a float array on each call and replace it with float variables
- Disable sensor events by default. Sensor events can fired at 10-100s Hz taking cpu and memory resources. Now the use of sensor data is behind a project setting allowing projects that have use of it to enable it, while other projects don't pay the cost for a feature they don't use
- Create a pool of specialized input `Runnable` objects to prevent spurious, unbounded `Runnable` allocations
- Disable showing the boot logo for Android XR projects
- Delete locale references of jni strings