Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace scratch register with XOR swap #206

Open
bwburnsides opened this issue Dec 21, 2024 · 1 comment
Open

Replace scratch register with XOR swap #206

bwburnsides opened this issue Dec 21, 2024 · 1 comment

Comments

@bwburnsides
Copy link

bwburnsides commented Dec 21, 2024

I'm getting up to snuff on compiler backend details for a personal project and the regalloc2 overview and the follow on article have been great.

I'm not going to pretend to completely understand the details here, but the MachineEnv requires a scratch register for what I gather is typically swaps, symbolically like this.

let x = 4;
let y = 5;

let swap = x;
x = y;
y = swap;

However there is a way to swap two values without introducing a third variable/register which is via an XOR swap:

x = y ^ x;
y = x ^ y;
x = y ^ x;

You shouldn't do this in high level code but should be fine in machine code of course. If an ISA allows integer operations to be applied to float registers then this can be used for both classes.

I'm nearly certain you know of this already, but I wanted to get it out there for the off chance, and if nothing else get details on why it wouldn't work.

@cfallin
Copy link
Member

cfallin commented Dec 21, 2024

Thanks for the comment, @bwburnsides!

Yes, indeed, we're aware of this trick; it's not implemented in RA2 for a combination of reasons:

  • There is a layered approach to performing register moves and using a dedicated scratch is the last resort; and actually the scratch is optional:
    • First, if there are no cycles in a set of moves (a "parallel move"), we can shift values along linear move chains without any temporary space.
    • Then, if there is a move cycle but there is any register in the right register class free, we can use that register (and we opportunistically look for one).
    • Then, we can borrow a register that doesn't participate in a move cycle by spilling its contents to a spare spillslot (which we dynamically create on need).
    • Then, finally, we can use a dedicated scratch; this last case is quite rare.
  • The three-XOR trick obfuscates the data flow from the point of view of register renaming hardware: while it can track physical values through moves and wire up instruction dependencies appropriately in the out-of-order window, arithmetic properties of XORs are beyond the reach of the high-speed move-detection circuitry in renaming (as far as I know). This thus reduces performance (relative to a sequence of true moves) with false dependencies.
  • On x86, we could use xchg; this is a more promising route if we have a two-register move cycle with no spare space. We haven't pursued this.

So, tl;dr -- not much need, and dubious performance implications. You're welcome to experiment with this of course if you find a benchmark where the need appears!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants