While all the previous fixes to optimizeVertexCache invocation fixed the
vertex transform efficiency, the import code still was missing two
crucial recommendations from meshoptimizer documentation:
- All meshes should be optimized for vertex cache (this reorders
vertices for maximum fetch efficiency)
- When LODs are used with a shared vertex buffer, the vertex order
should be generated by doing a vertex fetch optimization on the
concatenated index buffer from coarse to fine LODs; this maximizes
fetch efficiency for coarse LODs
The last point is especially crucial for Mali GPUs; unlike other GPUs
where vertex order affects fetch efficiency but not shading, these GPUs
have various shading quirks (depending on the GPU generation) that
really require consecutive index ranges for each LOD, which requires the
second optimization mentioned above. However all of these also help
desktop GPUs and other mobile GPUs as well.
Because this optimization is "global" in the sense that it affects all
LODs and all vertex arrays in concert, I've taken this opportunity to
isolate all optimization code in this function and pull it out of
generate_lods and create_shadow_mesh; this doesn't change the vertex
cache efficiency, but makes the code cleaner. Consequently,
optimize_indices should be called after other functions like
create_shadow_mesh / generate_lods.
This required exposing meshopt_optimizeVertexFetchRemap; as a drive-by,
meshopt_simplifySloppy was never used so it's not exposed anymore - this
will simplify future meshopt upgrades if they end up changing the
function's interface.