Skip to content

Releases: kubkon/bold

v0.1.0

27 Dec 11:31
Compare
Choose a tag to compare

zld is now bold

I have renamed zld and then emerald to bold and made it a Mach-O only linker. I needed to face the harsh reality that I will not be able to develop and maintain multiple drivers at the same time and so I've decided to focus on Mach-O only. If anyone is interested in carrying on the work with the other drivers however, the source code is archived and available in kubkon/emerald-old repo. I also think that bold being 100% Mach-O only can be a nice complement to the mold linker (which is ELF-only). Wishful thinking, I know, maybe one day ;-)

Some numbers

Since landing #143, bold is now competing with LLVM lld while still playing catch up with the new Apple linker. Some benchmark results that track the time it takes to link stage3-zig compiler which includes linking LLVM statically:

$ hyperfine ./bold.sh ./ld.sh ./ld_legacy.sh ./lld.sh
Benchmark 1: ./bold.sh
  Time (mean ± σ):      1.088 s ±  0.018 s    [User: 3.174 s, System: 1.004 s]
  Range (min … max):    1.039 s …  1.104 s    10 runs

Benchmark 2: ./ld.sh
  Time (mean ± σ):     491.8 ms ±  19.5 ms    [User: 1891.5 ms, System: 304.7 ms]
  Range (min … max):   458.1 ms … 509.9 ms    10 runs

Benchmark 3: ./ld_legacy.sh
  Time (mean ± σ):      2.132 s ±  0.013 s    [User: 3.242 s, System: 0.256 s]
  Range (min … max):    2.104 s …  2.150 s    10 runs

Benchmark 4: ./lld.sh
  Time (mean ± σ):      1.160 s ±  0.021 s    [User: 1.329 s, System: 0.247 s]
  Range (min … max):    1.133 s …  1.208 s    10 runs

Summary
  ./ld.sh ran
    2.21 ± 0.10 times faster than ./bold.sh
    2.36 ± 0.10 times faster than ./lld.sh
    4.33 ± 0.17 times faster than ./ld_legacy.sh

In the results

  • bold.sh calls bold
  • ld.sh calls the rewritten Apple linker
  • ld_legacy.sh calls ld -ld_classic the legacy Apple linker
  • lld.sh calls the LLVM lld linker

What's Changed

I'm only going to list the most notable changes:

  1. output produced by the linker is now more compatible with Apple tooling; An example of this is sorting all relocs in descending order (an unwritten rule required by Apple) - #101
  2. some fixes for macOS 11 have also landed. I want bold to support as many older versions of macOS as possible but testing it is now a problem since GitHub is deprecating runners with older macOS versions - #103
  3. add support for VisionOS - #129
  4. add support for merging cstrings and literals. This should produce output that has mergeable literals deduped and is thus smaller in size - #137
  5. speed up the linker by embracing multi-threaded approach, which is probably the biggest woop of this release. We are now on par with LLVM lld linker. Next target will be to beat the new Apple linker - #143
  6. handle DWARF v1 all the way to v5 - #161
  7. handle enough flags to be able to link Roc compiler with bold - #163
  8. handle -arch_multiple and -final_output flags, which effectively makes the linker compatible with lipo invoked directly by clang - #166 and #167

New Contributors

Full Changelog: v0.0.4...v0.1.0

macho: add -r support

08 Jan 22:16
Compare
Choose a tag to compare

Implement -r mode for the Mach-O linker.

What's Changed

Full Changelog: v0.0.3...v0.0.4

zld 0.0.3

23 Dec 06:59
Compare
Choose a tag to compare

A complete MachO driver rewrite + ELF x86_64 support

  • MachO driver has been fully redesigned mainly so that we can easily plug it in into Zig much like we did with the ELF driver.
  • This is the first tagged release of the ELF x86_64 driver.
  • Thanks to @Luukdegram we also get some Wasm linker support 🎉

Better MachO link times (on Apple Silicon)

Seems we have regressed in link times prior to the MachO rewrite. Below you will find a quick benchmark of linking Zig's stage4 binary. You can also see what Rui meant when he mentioned that Apple stepped up their game - Apple's stock linker crushed zld before and after the rewrite and so it did with LLVM's LLD. That's good though - it's something we can strive for. The results are also good news for us since the rewrite is significantly faster than what we had before (zld_old is based on 19ccd5c FWIW), and I haven't yet even started thinking about parallelising parsing input objects, etc., a trick that all three linkers currently employ (Apple's ld, ld64.lld and Rui's sold linker).

$ hyperfine ./zld ./zld_old ./lld ./ld64
Benchmark 1: ./zld
  Time (mean ± σ):      3.236 s ±  0.014 s    [User: 3.460 s, System: 0.750 s]
  Range (min … max):    3.219 s …  3.269 s    10 runs

Benchmark 2: ./zld_old
  Time (mean ± σ):      4.419 s ±  0.011 s    [User: 4.671 s, System: 0.761 s]
  Range (min … max):    4.398 s …  4.430 s    10 runs

Benchmark 3: ./lld
  Time (mean ± σ):      1.497 s ±  0.019 s    [User: 1.700 s, System: 0.308 s]
  Range (min … max):    1.457 s …  1.529 s    10 runs

Benchmark 4: ./ld64
  Time (mean ± σ):     627.0 ms ±   7.2 ms    [User: 2140.1 ms, System: 385.3 ms]
  Range (min … max):   617.8 ms … 639.4 ms    10 runs

Summary
  ./ld64 ran
    2.39 ± 0.04 times faster than ./lld
    5.16 ± 0.06 times faster than ./zld
    7.05 ± 0.08 times faster than ./zld_old

The actual full linker invocation is as follows:

/Users/kubkon/dev/zld/zig-out/bin/ld64.zld -dynamic -platform_version macos 13.6.3 14.0 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -headerpad_max_install_names /Users/kubkon/dev/zig/build/zigcpp/libzigcpp.a /Users/kubkon/opt/llvm17-release/lib/libclangFrontendTool.a /Users/kubkon/opt/llvm17-release/lib/libclangCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libclangFrontend.a /Users/kubkon/opt/llvm17-release/lib/libclangDriver.a /Users/kubkon/opt/llvm17-release/lib/libclangSerialization.a /Users/kubkon/opt/llvm17-release/lib/libclangSema.a /Users/kubkon/opt/llvm17-release/lib/libclangStaticAnalyzerFrontend.a /Users/kubkon/opt/llvm17-release/lib/libclangStaticAnalyzerCheckers.a /Users/kubkon/opt/llvm17-release/lib/libclangStaticAnalyzerCore.a /Users/kubkon/opt/llvm17-release/lib/libclangAnalysis.a /Users/kubkon/opt/llvm17-release/lib/libclangASTMatchers.a /Users/kubkon/opt/llvm17-release/lib/libclangAST.a /Users/kubkon/opt/llvm17-release/lib/libclangParse.a /Users/kubkon/opt/llvm17-release/lib/libclangSema.a /Users/kubkon/opt/llvm17-release/lib/libclangBasic.a /Users/kubkon/opt/llvm17-release/lib/libclangEdit.a /Users/kubkon/opt/llvm17-release/lib/libclangLex.a /Users/kubkon/opt/llvm17-release/lib/libclangARCMigrate.a /Users/kubkon/opt/llvm17-release/lib/libclangRewriteFrontend.a /Users/kubkon/opt/llvm17-release/lib/libclangRewrite.a /Users/kubkon/opt/llvm17-release/lib/libclangCrossTU.a /Users/kubkon/opt/llvm17-release/lib/libclangIndex.a /Users/kubkon/opt/llvm17-release/lib/libclangToolingCore.a /Users/kubkon/opt/llvm17-release/lib/libclangExtractAPI.a /Users/kubkon/opt/llvm17-release/lib/libclangSupport.a /Users/kubkon/opt/llvm17-release/lib/liblldMinGW.a /Users/kubkon/opt/llvm17-release/lib/liblldELF.a /Users/kubkon/opt/llvm17-release/lib/liblldCOFF.a /Users/kubkon/opt/llvm17-release/lib/liblldWasm.a /Users/kubkon/opt/llvm17-release/lib/liblldMachO.a /Users/kubkon/opt/llvm17-release/lib/liblldCommon.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWindowsManifest.a /Users/kubkon/opt/llvm17-release/lib/libLLVMXRay.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLibDriver.a /Users/kubkon/opt/llvm17-release/lib/libLLVMDlltoolDriver.a /Users/kubkon/opt/llvm17-release/lib/libLLVMCoverage.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLineEditor.a /Users/kubkon/opt/llvm17-release/lib/libLLVMXCoreDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMXCoreCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMXCoreDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMXCoreInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMX86TargetMCA.a /Users/kubkon/opt/llvm17-release/lib/libLLVMX86Disassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMX86AsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMX86CodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMX86Desc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMX86Info.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWebAssemblyDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWebAssemblyAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWebAssemblyCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWebAssemblyUtils.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWebAssemblyDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMWebAssemblyInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMVEDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMVEAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMVECodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMVEDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMVEInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSystemZDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSystemZAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSystemZCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSystemZDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSystemZInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSparcDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSparcAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSparcCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSparcDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMSparcInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMRISCVTargetMCA.a /Users/kubkon/opt/llvm17-release/lib/libLLVMRISCVDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMRISCVAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMRISCVCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMRISCVDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMRISCVInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMPowerPCDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMPowerPCAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMPowerPCCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMPowerPCDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMPowerPCInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMNVPTXCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMNVPTXDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMNVPTXInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMSP430Disassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMSP430AsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMSP430CodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMSP430Desc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMSP430Info.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMipsDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMipsAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMipsCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMipsDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMMipsInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLoongArchDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLoongArchAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLoongArchCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLoongArchDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLoongArchInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLanaiDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLanaiCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLanaiAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLanaiDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMLanaiInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMHexagonDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMHexagonCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMHexagonAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMHexagonDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMHexagonInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMBPFDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMBPFAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMBPFCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMBPFDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMBPFInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAVRDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAVRAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAVRCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAVRDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAVRInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMARMDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMARMAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMARMCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMARMDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMARMUtils.a /Users/kubkon/opt/llvm17-release/lib/libLLVMARMInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUTargetMCA.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUDisassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUAsmParser.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUCodeGen.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUDesc.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUUtils.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAMDGPUInfo.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAArch64Disassembler.a /Users/kubkon/opt/llvm17-release/lib/libLLVMAArch64AsmPa...
Read more

zld 0.0.2

26 Oct 09:44
Compare
Choose a tag to compare

The major achievement of this release is in rewriting of the majority of the MachO linker in the spirit of data-oriented design which led to

  • significantly reduced link times to the point where, dare I say, we start to become competitive with lld and ld64 - you can find some numbers below, and
  • reduced memory usage by avoiding unnecessary allocs, and instead re-parsing data when actually needed.

Some benchmarks

zld refers to our linker as a standalone binary, lld to LLVM's linker, and ld64 to Apple's linker. All in all, I should point out that we are still missing a number of optimisations in the linker such as cstring deduplication, compression of dynamic linker's relocations, and synthesising of the unwind info section, so this difference between us and other linkers will most likely shrink a little.

Results on M1Pro

  • linking redis-server binary
❯ hyperfine ./zld ./lld ./ld64 --warmup 60
Benchmark 1: ./zld
  Time (mean ± σ):      35.6 ms ±   0.4 ms    [User: 35.9 ms, System: 10.4 ms]
  Range (min … max):    34.8 ms …  36.4 ms    79 runs
 
Benchmark 2: ./lld
  Time (mean ± σ):      49.2 ms ±   0.8 ms    [User: 42.6 ms, System: 17.6 ms]
  Range (min … max):    48.0 ms …  51.2 ms    59 runs
 
Benchmark 3: ./ld64
  Time (mean ± σ):      47.2 ms ±   0.5 ms    [User: 60.1 ms, System: 14.4 ms]
  Range (min … max):    46.2 ms …  48.1 ms    61 runs
 
Summary
  './zld' ran
    1.32 ± 0.02 times faster than './ld64'
    1.38 ± 0.03 times faster than './lld'
  • linking Zig's stage3 compiler
❯ hyperfine ./zld ./lld ./ld64 --warmup 5
Benchmark 1: ./zld
  Time (mean ± σ):      1.934 s ±  0.012 s    [User: 2.870 s, System: 0.468 s]
  Range (min … max):    1.923 s …  1.962 s    10 runs
 
Benchmark 2: ./lld
  Time (mean ± σ):      1.153 s ±  0.014 s    [User: 1.289 s, System: 0.230 s]
  Range (min … max):    1.141 s …  1.179 s    10 runs
 
Benchmark 3: ./ld64
  Time (mean ± σ):      2.349 s ±  0.006 s    [User: 3.875 s, System: 0.218 s]
  Range (min … max):    2.341 s …  2.357 s    10 runs
 
Summary
  './lld' ran
    1.68 ± 0.02 times faster than './zld'
    2.04 ± 0.02 times faster than './ld64'

Results on Intel i9

  • linking Zig's stage3 compiler
❯ hyperfine ./zld ./lld ./ld64 --warmup 5                                                                                                                                                                                          
Benchmark 1: ./zld
  Time (mean ± σ):      3.039 s ±  0.018 s    [User: 2.339 s, System: 0.671 s]
  Range (min … max):    3.000 s …  3.064 s    10 runs
 
Benchmark 2: ./lld
  Time (mean ± σ):      1.383 s ±  0.015 s    [User: 1.393 s, System: 0.483 s]
  Range (min … max):    1.363 s …  1.416 s    10 runs
 
Benchmark 3: ./ld64
  Time (mean ± σ):      2.090 s ±  0.018 s    [User: 3.000 s, System: 0.620 s]
  Range (min … max):    2.066 s …  2.126 s    10 runs
 
Summary
  './lld' ran
    1.51 ± 0.02 times faster than './ld64'
    2.20 ± 0.03 times faster than './zld'

Detailed overview of major changes

No relocs/code pre-parsing per Atom

Prior to this rewrite, we would preparse the code and relocs per each Atom aka a subsection of an input section per relocatable object file, and store the results on the heap. This is not only slow but also completely unnecessary. We can actually delay the work until we actually need it. This approach is now followed throughout.

Linker now follows standard stages

Like lld, mold and ld64, we also implement linking in stages, e.g., first comes symbol resolution, then we parse input sections into atoms, we then do dead code stripping (if desired), then create synthetic atoms such GOT cells, then create thunks if required, etc. This significantly simplified the entire linker as we do a very specialised work per stage and no more.
We do not store any code or relocs per synthetic atoms

Instead of generating the code and relocs per synthetic atoms (GOT, stubs, etc) we only track their numbers, VM addresses and targets, while we generate the code and relocate when writing to the final image. In fact, we do not even need to track the addresses beyond the start and size of each synthetic section. I will refactor this in the future also.

Thunks

While at it, I also went ahead and implemented range extending thunks which mean we can now link larger programs on arm64 without erroring out in the linker. One word of explanation is that contrary to what the issue suggested, we extend jump range via thunks rather than branch islands. For those unfamiliar, both methods extend the range of jump for the given RISC ISA, however, thunks use the scratch register and a load to load unreachable target's address into the scratch register and branch via register. As such, a thunk is 12 bytes on arm64. Branch islands on the other hand are 4 bytes as they are simple bl #next_label instructions. Branch islands are thus short range extenders where in order to jump further in the file, we chain the jumps by jumping between islands until reaching the actual target.

What's Changed

  • macho: improve linking speed, reduce memory usage by @kubkon in #9
  • elf: simplify like macho linker by @kubkon in #10
  • fixes: stage3 the new default by @kubkon in #11

Full Changelog: v0.0.1...v0.0.2

zld 0.0.1

26 Jul 18:19
3bd20cd
Compare
Choose a tag to compare

While this linker is still widely incomplete, it is used as the default MachO linker in Zig toolchain since 0.8.0 release. It should be pointed out that the two codebases do not share code in a simple way - I periodically backport changes/fixes between the two repos so that zld can be used effectively both as part of Zig and as a standalone linking driver.

For this reason, I have decided to cut first release now that I have cleaned up arguments parsing across linking drivers (Elf and MachO), and will be working on advancing Elf driver more.

What's Changed

  • coff: Add initial Coff support by @Pombal in #1
  • Update CI scripts to pull QEMU for Linux/ELF testing by @kubkon in #2
  • coff: Implement basic object parsing by @iddev5 in #4
  • Implement different zld drivers via symlinks and refactor usage/options per driver by @kubkon in #8

New Contributors

Full Changelog: https://github.com/kubkon/zld/commits/v0.0.1