* reduce iteration cost by not tracking unused entries
* avoid emitting unused abbrevs to `.debug_abbrev`
* get the compiler executable passing `llvm-dwarfdump --verify`
* make it possible to skip `.debug_line` padding much more quickly
* Indices of referenced captures
* Line and column of `@src()`
The second point aligns with a reversal of the "incremental compilation"
section of https://github.com/ziglang/zig/issues/2029#issuecomment-645793168.
This reversal was already done as #17688 (46a6d50), with the idea to
push incremental compilation down the line. My proposal is to keep it as
comptime-known, and simply re-analyze uses of `@src()` whenever their
line/column change.
I think this decision is reasonable for a few reasons:
* The Zig compiler is quite fast. Occasionally re-analyzing a few
functions containing `@src()` calls is perfectly acceptable and won't
noticably impact update times.
* The system described by Andrew in #2029 is currently vaporware.
* The system described by Andrew in #2029 is non-trivial to implement.
In particular, it requires some way to have backends update a single
global in certain cases, without re-doing semantic analysis. There is
no other part of incremental compilation which requires this.
* Having `@src().line` be comptime-known is useful. For instance, #17688
was justified by broken Tracy integration because the source line
couldn't be comptime-known.
In a `memoized_call`, store how many backwards braches the call
performs. Add this to `sema.branch_count` when using a memoized call. If
this exceeds the quota, perform a non-memoized call to get a correct
"exceeded X backwards branches" error.
Also, do not memoize calls which do `@setEvalBranchQuota` or similar, as
this affects global state which must apply to the caller.
Change some eval branch quotas so that the compiler itself still builds correctly.
This commit manually changes a file in Aro which is automatically
generated. The sources which generate the file are not in this repo.
Upstream Aro should make the suitable changes on their end before the
next sync of Aro sources into the Zig repo.
Closes#21132
According to the XDG Base Directory specification
(https://specifications.freedesktop.org/basedir-spec/latest/#variables),
empty values for these environment variables should be treated the same
as if they are unset. Specifically, for the instances changed in this
commit,
> $XDG_DATA_HOME defines the base directory relative to which
> user-specific data files should be stored. If $XDG_DATA_HOME is either
> not set **or empty**, a default equal to $HOME/.local/share should be
> used.
and
> $XDG_CACHE_HOME defines the base directory relative to which
> user-specific non-essential data files should be stored. If
> $XDG_CACHE_HOME is either not set **or empty**, a default equal to
> $HOME/.cache should be used.
(emphasis mine)
In addition to the case mentioned in the linked issue, all other uses of
XDG environment variables were corrected.
* std.c.darwin: add missing CPUFAMILY fields
* std.zig.system.detectNativeCpuAndFeatures: add missing darwin fields
* add comment so the prong isnt lost and easily discoverable during next llvm upgrade
A compilation build step for which the binary is not required could not
be compiled previously. There were 2 issues that caused this:
- The compiler communicated only the results of the emitted binary and
did not properly communicate the result if the binary was not emitted.
This is fixed by communicating the final hash of the artifact path (the
hash of the corresponding /o/<hash> directory) and communicating this
instead of the entire path. This changes the zig build --listen protocol
to communicate hashes instead of paths, and emit_bin_path is accordingly
renamed to emit_digest.
- There was an error related to the default llvm object path when
CacheUse.Whole was selected. I'm not really sure why this didn't manifest
when the binary is also emitted.
This was fixed by improving the path handling related to flush() and
emitLlvmObject().
In general, this commit also improves some of the path handling throughout
the compiler and standard library.
This replaces the constant `Zir.Inst.Ref` tags (and the analagous tags
in `Air.Inst.Ref`, `InternPool.Index`) referring to types in
`std.builtin` with a ZIR instruction `extended(builtin_type(...))` which
instructs Sema to fetch such a type, effectively as if it were a
shorthand for the ZIR for `@import("std").builtin.xyz`.
Previously, this was achieved through constant tags in `Ref`. The
analagous `InternPool` indices began as `simple_type` values, and were
later rewritten to the correct type information. This system was kind of
brittle, and more importantly, isn't compatible with incremental
compilation of std, since incremental compilation relies on the ability
to recreate types at different indices when they change. Replacing the
old system with this instruction slightly increases the size of ZIR, but
it simplifies logic and allows incremental compilation to work correctly
on the standard library.
This shouldn't have a significant impact on ZIR size or compiler
performance, but I will take measurements in the PR to confirm this.
The kernel sets r7 to 0 (success) or -1 (error), and stores the result in r2.
When r7 is -1 and the result is positive, it needs to be negated to get the
errno value that higher-level code, such as errnoFromSyscall(), expects to see.
The old code was missing the check that r2 is positive, but was also doing the
r7 check incorrectly; since it can only be set to 0 or -1, the blez instruction
would always branch.
In practice, this fix is necessary for e.g. the ENOSYS error to be interpreted
correctly. This manifested as hitting an unreachable branch when calling
process_vm_readv() in std.debug.MemoryAccessor.
The old logic here had bitrotted, largely because there were some
incorrect `else` cases. This is now implemented correctly for all
current ZIR instructions. This prevents instructions being lost in
incremental updates, which is important for updates to be minimal.
Simplifies code in docs creation where we used `std.tar.output.Header`.
Writer uses that Header internally and provides higher level interface.
Updates checksum on write, handles long file names, allows setting mtime and file permission mode. Provides handy interface for passing `Dir.WalkerEntry`.
For csky, we can just always do the gb initialization. For riscv, the gp code is
temporarily pulled above the main switch until the blocking issue is resolved.
It's entirely unclear whether this should map to POWERPC or POWERPCFP, and as I
can find no evidence of people producing PE files for PowerPC since Windows NT,
let's just not make a likely-wrong guess. We can revisit this if the need ever
actually arises.
All of these were mapping to types that are little endian. In fact, I can find
no evidence that either Windows or UEFI have ever been used on big endian
systems.
Using --watch I noticed a couple of issues with my initial attempt. 1) The index I used as 'completion key' was not stable over time, when directories are being added/removed the key no longer corresponds with the intended dir. 2) There exists a race condition in which we receive a completion notification for a directory that was removed. My solution is to generate a key value and associate it with each Directory.
Two fixes here:
Sort by addresses after generating the line table. Debug information in
the wild is not sorted and the rest of the implementation requires this
data to be sorted.
Handle DW.LNE.end_sequence correctly. When I originally wrote this code,
I misunderstood what this opcode was supposed to do. Now I understand
that it marks the *end* of an address range, meaning the current address
does *not* map to the current line information.
This fixes source location information for a big chunk of ReleaseSafe
code.
The implementation assumed that compilation units did not overlap, which
is not the case. The new implementation uses .debug_ranges to iterate
over the requested PCs.
This partially resolves#20990. The dump-cov tool is fixed but the same
fix needs to be applied to `std.Build.Fuzz.WebServer` (sorting the PC
list before passing it to be resolved by debug info).
I am observing LLVM emit multiple 8-bit counters for the same PC
addresses when enabling `-fsanitize-coverage=inline-8bit-counters`. This
seems like a bug in LLVM. I can't fathom why that would be desireable.
Some of this is arbitrary since spirv (as opposed to spirv32/spirv64) refers to
the version with logical memory layout, i.e. no 'real' pointers. This change at
least matches what clang does.
Versions can simply use the normal version range mechanism, or alternatively an
Abi tag if that makes more sense. For now, we only care about 4.5 anyway.
std.crypto has quite a few instances of breaking naming conventions.
This is the beginning of an effort to address that.
Deprecates `std.crypto.utils`.
* Upgrade from u8 to usize element types.
- WebAssembly assumes u64. It should probably try to be target-aware
instead.
* Move the covered PC bits to after the header so it goes on the same
page with the other rapidly changing memory (the header stats).
depends on the semantics of accepted proposal #19755closes#20994
The previous mecanism for linux distributions to delivers debug info into `/usr/lib/debug` no longer seems in use.
the current mecanism often is using `debuginfod` (https://sourceware.org/elfutils/Debuginfod.html)
This commit only tries to load already available debuginfo but does not try to make any download requests.
the user can manually run `debuginfod-find debuginfo PATH` to populate the cache.
- look up the debuglink file in the directory of the executable file (instead of the cwd)
- fix parsing of debuglink section (the 4-byte alignement is within the file, unrelated to the in-memory address)
The signature is documented as:
int link(const char *, const char *);
(see https://man7.org/linux/man-pages/man2/link.2.html or https://man.netbsd.org/link.2)
And its not some Linux extension, the [syscall
implementation](21b136cc63/fs/namei.c (L4794-L4797))
only expects two arguments too.
It probably *should* have a flags parameter, but its too late now.
I am a bit surprised that linking glibc or musl against code that invokes
a 'link' with three parameters doesn't fail (at least, I couldn't get any
local test cases to trigger a compile or link error).
The test case in std/posix/test.zig is currently disabled, but if I
manually enable it, it works with this change.
Due to the `std.crypto.ecdsa.KeyPair.create` taking and optional of seed, even if the seed is generated, cross-compiling to the environments without standard random source (eg. wasm) (`std.crypto.random.bytes`) will fail to compile.
This commit changes the API of the problematic function and moves the random seed generation to a new utility function.
this fix bypasses the slice bounds, reading garbage data for up to the
last 7 bits (which are technically supposed to be ignored). that's going
to need to be fixed, let's fix that along with switching from byte elems
to usize elems.
* libfuzzer: track unique runs instead of deduplicated runs
- easier for consumers to notice when to recheck the covered bits.
* move common definitions to `std.Build.Fuzz.abi`.
build runner sends all the information needed to fuzzer web interface
client needed in order to display inline coverage information along with
source code.
* libfuzzer: close file after mmap
* fuzzer/main.js: connect with EventSource and debug dump the messages.
currently this prints how many fuzzer runs have been attempted to
console.log.
* extract some `std.debug.Info` logic into `std.debug.Coverage`.
Prepares for consolidation across multiple different executables which
share source files, and makes it possible to send all the
PC/SourceLocation mapping data with 4 memcpy'd arrays.
* std.Build.Fuzz:
- spawn a thread to watch the message queue and signal event
subscribers.
- track coverage map data
- respond to /events URL with EventSource messages on a timer
* std.debug.Dwarf: add `sortCompileUnits` along with a field to track
the state for the purpose of assertions and correct API usage.
This makes batch lookups faster.
- in the future, findCompileUnit should be enhanced to rely on sorted
compile units as well.
* implement `std.debug.Dwarf.resolveSourceLocations` as well as
`std.debug.Info.resolveSourceLocations`. It's still pretty slow, since
it calls getLineNumberInfo for each array element, repeating a lot of
work unnecessarily.
* integrate these APIs with `std.Progress` to understand what is taking
so long.
The output I'm seeing from this tool shows a lot of missing source
locations. In particular, the main area of interest is missing for my
tokenizer fuzzing example.
with debug info resolved.
begin efforts of providing `std.debug.Info`, a cross-platform
abstraction for loading debug information into an in-memory format that
supports queries such as "what is the source location of this virtual
memory address?"
Unlike `std.debug.SelfInfo`, this API does not assume the debug
information in question happens to match the host CPU architecture, OS,
or other target properties.
* new .zig-cache subdirectory: 'v'
- stores coverage information with filename of hash of PCs that want
coverage. This hash is a hex encoding of the 64-bit coverage ID.
* build runner
* fixed bug in file system inputs when a compile step has an
overridden zig_lib_dir field set.
* set some std lib options optimized for the build runner
- no side channel mitigations
- no Transport Layer Security
- no crypto fork safety
* add a --port CLI arg for choosing the port the fuzzing web interface
listens on. it defaults to choosing a random open port.
* introduce a web server, and serve a basic single page application
- shares wasm code with autodocs
- assets are created live on request, for convenient development
experience. main.wasm is properly cached if nothing changes.
- sources.tar comes from file system inputs (introduced with the
`--watch` feature)
* receives coverage ID from test runner and sends it on a thread-safe
queue to the WebServer.
* test runner
- takes a zig cache directory argument now, for where to put coverage
information.
- sends coverage ID to parent process
* fuzzer
- puts its logs (in debug mode) in .zig-cache/tmp/libfuzzer.log
- computes coverage_id and makes it available with
`fuzzer_coverage_id` exported function.
- the memory-mapped coverage file is now namespaced by the coverage id
in hex encoding, in `.zig-cache/v`
* tokenizer
- add a fuzz test to check that several properties are upheld
When a unique run is encountered, track it in a bit set memory-mapped
into the fuzz directory so it can be observed by other processes, even
while the fuzzer is running.
This is arbitrary since spirv (as opposed to spirv32/spirv64) refers to the
version with logical memory layout, i.e. no 'real' pointers. This change at
least matches what clang does.
It is impossible to even build projects like glibc when targeting a generic
SPARC v8 CPU; LEON3 is effectively considered the baseline for `sparc-linux-gnu`
now, particularly due to it supporting a CASA instruction similar to the one in
SPARC v9. However, it's slightly incompatible with SPARC v9 due to having a
different ASI tag, so resulting binaries would not be portable to regular SPARC
CPUs. So, as the least bad option, make v9 the baseline for sparc32.
`__xl_a` is just a global variable containing a null function pointer. There's
nothing magical about it or its name at all.
The section names used on `__xl_a` and `__xl_b` (`.CRT$XLA` and `.CRT$XLZ`) are
the real magic here. The compiler emits TLS variables into `.CRT$XL<x>`
sections, where `x` is an uppercase letter between A and Z (exclusive). The
linker then sorts those sections alphabetically (due to the `$`), and the result
is a neat array of TLS initialization callbacks between `__xl_a` and `__xl_z`.
That array is null-terminated, though! Normally, `__xl_z` serves as the null
terminator; however, by pointing `AddressesOfCallBacks` to `__xl_a`, which just
contains a null function pointer, we've effectively made it so that the PE
loader will just immediately stop invoking TLS callbacks. Fix that by pointing
to the first actual TLS callback instead (or `__xl_z` if there are none).
If there is a VDSO, it will have clock_gettime(). The main thing we're concerned
about is architectures that don't have a VDSO at all, of which there are a few.
There are two concepts here: one for whether dwarf supports unwinding on
that target, and another for whether the Zig standard library
implements it yet.
...which have a ucontext_t but not a PC register. The current stack
unwinding implementation does not yet support this architecture.
Also fix name of `std.debug.SelfInfo.openSelf` to remove redundancy.
Also removed this hook into root providing an "openSelfDebugInfo"
function. Sorry, this debugging code is not of sufficient quality to
offer a plugin API right now.
This target triple was weird on multiple levels:
* The `ilp32` ABI is the soft float ABI. This is not the main ABI we want to
support on RISC-V; rather, we want `ilp32d`.
* `gnuilp32` is a bespoke tag that was introduced in Zig. The rest of the world
just uses `gnu` for RISC-V target triples.
* `gnu_ilp32` is already the name of an ILP32 ABI used on AArch64. `gnuilp32` is
too easy to confuse with this.
* We don't use this convention for `riscv64-linux-gnu`.
* Supporting all RISC-V ABIs with this convention will result in combinatorial
explosion; see #20690.
After this commit:
`std.debug.SelfInfo` is a cross-platform abstraction for the current
executable's own debug information, with a goal of minimal code bloat
and compilation speed penalty.
`std.debug.Dwarf` does not assume the current executable is itself the
thing being debugged, however, it does assume the debug info has the
same CPU architecture and OS as the current executable. It is planned to
remove this limitation.
This code has the hard-coded goal of supporting the executable's own
debug information and makes design choices along that goal, such as
memory-mapping the inputs, using dl_iterate_phdr, and doing conditional
compilation on the host target.
A more general-purpose implementation of debug information may be able
to share code with this, but there are some fundamental
incompatibilities. For example, the "SelfInfo" implementation wants to
avoid bloating the binary with PDB on POSIX systems, and likewise DWARF
on Windows systems, while a general-purpose implementation needs to
support both PDB and DWARF from the same binary. It might, for example,
inspect the debug information from a cross-compiled binary.
`SourceLocation` now lives at `std.debug.SourceLocation` and is
documented.
Deprecate `std.debug.runtime_safety` because it returns the optimization
mode of the standard library, when the caller probably wants to use the
optimization mode of their own module.
`std.pdb.Pdb` is moved to `std.debug.Pdb`, mirroring the recent
extraction of `std.debug.Dwarf` from `std.dwarf`.
I have no idea why we have both Module (with a Windows-specific
definition) and WindowsModule. I left some passive aggressive doc
comments to express my frustration.
std.debug.Dwarf is the parsing/decoding logic. std.dwarf remains the
unopinionated types and bits alone.
If you look at this diff you can see a lot less redundancy in
namespaces.
* Rename isPPC() -> isPowerPC32().
* Rename isPPC64() -> isPowerPC64().
* Add new isPowerPC() function which covers both.
There was confusion even in the standard library about what isPPC() meant. This
change makes these functions work how I think most people actually expect them
to work, and makes them consistent with isMIPS(), isSPARC(), etc.
I chose to rename from PPC to PowerPC because 1) it's more consistent with the
other functions, and 2) it'll cause loud rather than silent breakage for anyone
who might have been depending on isPPC() while misunderstanding it.