* the file's doc-comment was misleading and did not focus on the correct aspect of SIMD
* added cpu flag awareness to `suggestVectorLengthForCpu` in order to provide a more accurate vector length
Now we generate debug undefined constants when the user asks for them to dedup across the function decl. This takes 2 instructions instead of 7 in the RISC-V backend.
TODO, we need to dedupe across function decl boundaries.
Compilation errors now report a failure on rebuilds triggered by file
system watches.
Compiler crashes now report failure correctly on rebuilds triggered by
file system watches.
The compiler subprocess is restarted if a broken pipe is encountered on
a rebuild.
Remove --debug-incremental
This flag is also added to the build system. Importantly, this tells
Compile step whether or not to keep the compiler running between
rebuilds. It defaults off because it is currently crashing
zirUpdateRefs.
Changes the `make` function signature to take an options struct, which
additionally includes `watch: bool`. I intentionally am not exposing
this information to configure phase logic.
Also adds global zig cache to the compiler cache prefixes.
Closes#20600
Now that we use the PEB to get the precise length of the command line string, there's no need for a multi-item pointer/sliceTo call. This provides a minor speedup:
Benchmark 1 (153 runs): benchargv-before.exe
measurement mean ± σ min … max outliers delta
wall_time 32.7ms ± 429us 32.1ms … 36.9ms 1 ( 1%) 0%
peak_rss 6.49MB ± 5.62KB 6.46MB … 6.49MB 14 ( 9%) 0%
Benchmark 2 (157 runs): benchargv-after.exe
measurement mean ± σ min … max outliers delta
wall_time 31.9ms ± 236us 31.4ms … 32.7ms 4 ( 3%) ⚡- 2.4% ± 0.2%
peak_rss 6.49MB ± 4.77KB 6.46MB … 6.49MB 14 ( 9%) + 0.0% ± 0.0%
Previously, to ensure args were encoded as well-formed WTF-8 (i.e. no encoded surrogate pairs), the code unit would be encoded and then the last 6 emitted bytes would be checked to see if they were a surrogate pair, and this was done for any emitted code unit (although this was not necessary, it should have only been done when emitting a low surrogate).
After this commit, we still want to ensure well-formed WTF-8, but, to do so, the last emitted code point is stored, meaning we can just directly check that the last code unit is a high surrogate and the current code unit is a low surrogate to determine if we have a surrogate pair.
This provides some performance benefit over and above a "use the same strategy as before but only check when we're emitting a low surrogate" implementation:
Benchmark 1 (111 runs): benchargv-master.exe
measurement mean ± σ min … max outliers delta
wall_time 45.2ms ± 532us 44.5ms … 49.4ms 2 ( 2%) 0%
peak_rss 6.49MB ± 3.94KB 6.46MB … 6.49MB 10 ( 9%) 0%
Benchmark 2 (154 runs): benchargv-storelast.exe
measurement mean ± σ min … max outliers delta
wall_time 32.6ms ± 293us 32.2ms … 34.2ms 8 ( 5%) ⚡- 27.8% ± 0.2%
peak_rss 6.49MB ± 5.15KB 6.46MB … 6.49MB 15 (10%) - 0.0% ± 0.0%
Benchmark 3 (131 runs): benchargv-onlylow.exe
measurement mean ± σ min … max outliers delta
wall_time 38.4ms ± 257us 37.9ms … 39.6ms 5 ( 4%) ⚡- 15.1% ± 0.2%
peak_rss 6.49MB ± 5.70KB 6.46MB … 6.49MB 9 ( 7%) - 0.0% ± 0.0%
Before this commit, the WTF-16 command line string would be converted to WTF-8 in `init`, and then a second buffer of the WTF-8 size + 1 would be allocated to store the parsed arguments. The converted WTF-8 command line would then be parsed and the relevant bytes would be copied into the argument buffer before being returned.
After this commit, only the WTF-8 size of the WTF-16 string is calculated (without conversion) which is then used to allocate the buffer for the parsed arguments. Parsing is then done on the WTF-16 slice directly, with the arguments being converted to WTF-8 on-the-fly.
This has a few (minor) benefits:
- Cuts the amount of memory allocated by ArgIteratorWindows in half (or better)
- Makes the total amount of memory allocated by ArgIteratorWindows predictable, since, before, the upfront `wtf16LeToWtf8Alloc` call could end up allocating more-memory-than-necessary temporarily due to its internal use of an ArrayList. Now, the amount of memory allocated is always exactly `calcWtf8Len(cmd_line) + 1`.
This was causing zig2.exe to crash during bootstrap, because there was an atomic
load of read-only memory, and the attempt to write to it as part of the (idempotent)
atomic exchange was invalid.
Aligned reads (of u32 / u64) are atomic on x86 / x64, so this is replaced with an
optimization-proof load (`__iso_volatile_load8*`) and a reordering barrier.
This allows the mutate mutex to only be locked during actual grows,
which are rare. For the lists that didn't previously have a mutex, this
change has little effect since grows are rare and there is zero
contention on a mutex that is only ever locked by one thread. This
change allows `extra` to be mutated without racing with a grow.
Makes the build runner compile successfully for non-linux targets;
printing an error if you ask for --watch rather than making build
scripts fail to compile.
it's not advertised in the usage and only available in debug builds of
the compiler. Makes it easier to test changes to the build runner that
might affect targets differently.
Updates the build runner to unconditionally require a zig lib directory
parameter. This parameter is needed in order to correctly understand
file system inputs from zig compiler subprocesses, since they will refer
to "the zig lib directory", and the build runner needs to place file
system watches on directories in there.
The build runner's fanotify file watching implementation now accounts
for when two or more Cache.Path instances compare unequal but ultimately
refer to the same directory in the file system.
Breaking change: std.Build no longer has a zig_lib_dir field. Instead,
there is the Graph zig_lib_directory field, and individual Compile steps
can still have their zig lib directories overridden. I think this is
unlikely to break anyone's build in practice.
The compiler now sends a "file_system_inputs" message to the build
runner which shares the full set of files that were added to the cache
system with the build system, so that the build runner can watch
properly and redo the Compile step. This is implemented for whole cache
mode but not yet for incremental cache mode.