Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for generating subexperiments with LO's translated to a native gate set #517

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

caleb-johnson
Copy link
Collaborator

@caleb-johnson caleb-johnson commented Mar 28, 2024

Fixes #492

This PR introduces an EagleEquivalenceLibrary and HeronEquivalenceLibrary for optionally translating all the gates we use in cutting decompositions to the corresponding native gate set during creation of subexperiments. This allows users to transpile their "Cut Circuit" one time, and when they generate_cutting_experiments all of their subexperiments will already be translated to the native gate set for the processor type they specified. If they don't specify, the "standard" gate set will be used. Heron single qubit native gates are the same as Eagle, so we get that equivalence library for free.

TODO:

  • Decide how to allow users to specify a native gate set. Probably a string from {"standard", "eagle", "heron"}, which defaults to "standard".
  • Clean up logic in qpd around parsing the target device and getting the right equivalence library
  • Tests for the library and tests for the new option in generate_cutting_experiments
  • Release note

@coveralls
Copy link

coveralls commented Mar 28, 2024

Pull Request Test Coverage Report for Build 9084999560

Details

  • 73 of 73 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.09%) to 95.582%

Totals Coverage Status
Change from base Build 9084779766: 0.09%
Covered Lines: 3570
Relevant Lines: 3735

💛 - Coveralls

q = QuantumRegister(1, "q")
def_rx = QuantumCircuit(q)
theta = Parameter("theta")
for inst in [RZGate(np.pi / 2), SXGate(), RZGate(theta + np.pi), RZGate(5 * np.pi / 2)]:
Copy link
Collaborator Author

@caleb-johnson caleb-johnson Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Qiskit to find these translations, and I didn't change anything, even if it looked suspicious. I'll label the ones I think can be simplified here, and we can discuss. This one rotates 5/2pi times at the end. I believe we can just rotate 1/2pi?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakelishman , we should be able to simplify these strange rotation angles, right? I know the translation process uses a shortest-path algorithm, so I assume it can find paths containing these funky angles. Just wanted to be sure there wasn't something I was missing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're acting symbolically here, you can only really use the BasisTranslator, which acts on a gate-by-gate basis, which just dumbly multiplies/divides parameters as appropriate during conversion. We don't do symbolic peephole optimisation (though we could), and we can't do full unitary resynthesis with symbolic parameters, so you'll end up with things like this trying to decompose symbolically in general.

In this particular case, we clearly could add a specific rx -> [rz,sx] decomposition if we so chose (at the moment the default equivalence library does some funny roundabout thing). There's probably lots of places where we could improve the standard equivalence library for symbolic gates - we mostly just decompose to Euler angles and resynthesise into something more efficient during an optimisation pass.

# SXdgGate
q = QuantumRegister(1, "q")
def_sxdg = QuantumCircuit(q)
for inst in [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suspicious one. Should be able to do a single pi rotation on either side of sx

q = QuantumRegister(1, "q")
def_ry = QuantumCircuit(q)
theta = Parameter("theta")
for inst in [SXGate(), RZGate(theta + np.pi), SXGate(), RZGate(3 * np.pi)]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to rotate by pi at the end here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that the reason there are factors of greater than 2pi in the equivalence library is because a rotation of 2pi doesn't actually bring the wavefunction to its original state; instead, e.g., for RZGate, it leads to the wavefunction picking up a global phase of -1. The rotation must be by an angle of 4pi to bring its state completely to where it began (see also: the mathematics of spinors). Even though a rotation by 2pi leads to a global phase of -1, this will not result in any difference in the state that is actually physically observable, but nonetheless Qiskit carefully keeps track of global phases, and I believe this is one instance of where that leads to some rotation angles that seem a bit atypical.

Here's a quick sanity check (in julia) given the RZGate definition of a rotation by 2pi:

In [1]: RZ(λ) = [exp(-im * λ / 2) 0; 0 exp(im * λ / 2)]
Out[1]: RZ (generic function with 1 method)

In [2]: RZ(2π)
Out[2]: 2×2 Matrix{ComplexF64}:
 -1.0-1.22465e-16im   0.0+0.0im
  0.0+0.0im          -1.0+1.22465e-16im

@caleb-johnson
Copy link
Collaborator Author

I was seeing some failures when I tried to use this in the roundtrip tests. For every roundtrip test, I would run the standard gate set and also the eagle gate set. 4 tests were failing. I suppose those tests should've been passing if all of our decompositions are logically equivalent.

I'd like to integrate the different equivalences into the roundtrip tests elegantly somehow. Would love to hear any suggestions. I also need to figure out why the tests are failing before merging this

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Mar 29, 2024

I was seeing some failures when I tried to use this in the roundtrip tests. For every roundtrip test, I would run the standard gate set and also the eagle gate set. 4 tests were failing. I suppose those tests should've been passing if all of our decompositions are logically equivalent.

I'd like to integrate the different equivalences into the roundtrip tests elegantly somehow. Would love to hear any suggestions. I also need to figure out why the tests are failing before merging this

All the roundtrip tests pass with Eagle translated LO's except those including crx or csx gates. csx is just a specific instance of crx decomposition, so the bug can be traced back to a problem with translating the decompositions from crx gates.

@caleb-johnson caleb-johnson added this to the 0.7.0 milestone Mar 29, 2024
@@ -142,8 +142,9 @@ def test_cutting_exact_reconstruction(example_circuit):
subcircuits, bases, subobservables = partition_problem(
qc, "AAB", observables=observables_nophase
)
qpu = [None, "eagle", "heron"][np.random.randint(2)]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests are fine to make sure the equivalences can be accessed and used in the ways we intend, but the equivalence bugs that worry me are bugs involving indexing and expval correctness, so I really want to fold the equivalence testing into the roundtrip tests.

That being said, I didn't want to double the runtime of this test by making a whole new loop testing the equivalences, so I will just randomly translate subexperiments and make sure we can reconstruct the exact expval.

I will sample the standard gate set 50% of the time and the eagle/heron gateset 50% of the time

@caleb-johnson
Copy link
Collaborator Author

@garrison @ibrahim-shehzad this is ready for final review now.

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Mar 30, 2024

All the roundtrip tests pass with Eagle translated LO's except those including crx or csx gates. csx is just a specific instance of crx decomposition, so the bug can be traced back to a problem with translating the decompositions from crx gates.

I had left off an SXGate from the translation, so the RXGate translation was simply incorrect. This is fixed now.

)

_eagle_sel = HeronEquivalenceLibrary = EagleEquivalenceLibrary = EquivalenceLibrary()
equivalence_libraries = defaultdict(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, if a user passes an unsupported basis, everything will run fine and their subexperiments will be in the standard gate set. Maybe we'd prefer to error if they pass in a QPU architecture we don't support or doesn't exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a user passes an unsupported basis

By that, do you mean that they have a QPDBasis that contains some gate(s) that are not supported by this equivalence library?

everything will run fine and their subexperiments will be in the standard gate set

What do you mean by "everything will run fine"? What will happen to the unsupported gates?

Copy link
Collaborator Author

@caleb-johnson caleb-johnson Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user passes basis_gate_set="nonsense", their gates will come out in the standard gate set defined in decompositions.py. In other words, it's a no-op

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that, do you mean that they have a QPDBasis that contains some gate(s) that are not supported by this equivalence library?

I mean they pass in a string that doesn't describe a supported QPU architecture. A string not in {"heron", "eagle"}

Copy link
Member

@garrison garrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, one big comment: I wonder to what extent we could be relying on Qiskit for this functionality. We could get the same thing from the BasisTranslator, I believe. The problem you mentioned previously, from what I understand, is that the circuit -> dagcircuit -> circuit round-trip is inefficient. But, do we actually have data on how slow this overall process is compared to using the functionality introduced here? And if it is much slower, I wonder how much is because of the circuit <-> dagcircuit roundtrip vs. how much of it is the BasisTranslator itself.

If we were to use BasisTranslator, we would support any hardware that Qiskit does instead of having to add support for each backend's basis set individually. This feels like the wrong layer to do it if Qiskit already has that support -- and from the API perspective, the user has already chosen a backend once -- why should they have to specify again that it's Heron, Eagle, or whatever. But if Qiskit's benchmarks make this infeasible, we will have to work around it somehow, like this PR does.


Caleb's reply:

But, do we actually have data on how slow this overall process is compared to using the functionality introduced here?

Transpiling 1296 depth-4 133-qubit circuits takes 41 seconds to transpile in our cutting-pea experiments. So if these were utility-scale circuits that weren't simple graph states, you can multiply this by some factor; whereas, doing it the way introduced in this PR the relative runtime cost would be ~0.

@jakelishman
Copy link
Member

jakelishman commented Apr 1, 2024

For QuantumCircuit <-> DAGCircuit, just as a quick check: are you using copy_operations=False as appropriate when you are sure about your ownership (or not) of the inputs? You can almost certainly do dag_to_circuit(dag, copy_operations=False) on output, and if you have assumed ownership of the input, you can do it in circuit_to_dag too, which avoids all copies.

edit: Qiskit's PassManager.run copies on input but not on output, that is it assumes that it's borrowing the input data, but after that, any mutations are local, so at the exit point of run, it knows it owns its data so there's no need to copy out. I don't know what (if anything) BasePass.__call__ does on copy - I suspect nothing, since 95% of the time we forget that you can call an individual pass on a QuantumCircuit directly (which we probably ought to look at).

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Apr 1, 2024

and from the API perspective, the user has already chosen a backend once

I'm not sure what you mean here. They've potentially transpiled their cut circuit using their backend in their local environment, but they haven't given a backend to CKT at this point in the workflow. They're using this new kwarg to communicate to CKT how their cut circuit has been transpiled so we can prepare their gate decompositions correctly

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Apr 1, 2024

Instead of {"heron", "eagle"}, we could take the Backend directly and key on its sorted(backend.basis_gates) to find the right equivalence library.

EDIT: This is kind of a bad idea. Using these two strings makes everything very straightforward. The info we need is being passed explicitly from the user. Passing in a backend means we have to account for all the different backend classes and make sure we are extracting their basis gate sets correctly and making sure we have accounted for all the permutations of basis gates across Runtime, qiskit provider, and Qiskit generic backends.

Leaving this the way it is

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented Apr 3, 2024

@garrison I know we run some transpilation during creation of the subexperiments (e.g. RemoveFinalResets), so we already do the copies for each subexperiment. If you think it is more appropriate to handle the sampled gates at that time, that would be equally effective.

Totally open to adding these gates during transpilation or the way I have implemented here. I don't think users should have to do any extra copies for translation that can reasonably be avoided though, due to the volume of subexperiments.

@caleb-johnson caleb-johnson modified the milestones: 0.7.0, 0.8.0 Apr 16, 2024
@garrison
Copy link
Member

garrison commented May 7, 2024

@garrison I know we run some transpilation during creation of the subexperiments (e.g. RemoveFinalResets), so we already do the copies for each subexperiment. If you think it is more appropriate to handle the sampled gates at that time, that would be equally effective.

Obviously Caleb knows this already, but for anyone else watching this thread: we actually ended up removing these passes in #556 in order to improve performance, replacing them with functions that operate directly on the QuantumCircuit object instead of the DAGCircuit. The modifications made by these passes are so incredibly minor that the DAGCircuit representation isn't necessary, and removing the round-trip speeds up experiment generation significantly.

@garrison
Copy link
Member

garrison commented May 9, 2024

I am essentially on board with this.

Indeed, I have always desired for the bulk of transpilation to occur before generating the subexperiments, and my modification and test in #303 were meant to the be first step toward supporting that kind of workflow. (My recent PR, #573, creates a how-to guide for demonstrating this.)

However, obviously there are some steps which must be run after the subexperiments are generated. Previously, I've imagined that this will involve a pass manager that has a reduced set of passes, including some final basis translation together with some simple optimization passes (e.g., if two Hadamard gates appear on a qubit consecutively, they can be removed because $H^2 = I$). So, from my prior perspective, we should just bite the bullet with the DAGCircuit conversion, because translation might not be the only thing we need to do on the subexperiments.

I now think that there is a time where such passes on a DAGCircuit are appropriate and a time where they are not appropriate. The optimization passes I am imagining will only make minor changes to the subexperiments, and this might indeed make sense in a workflow when one is targeting the absolute best circuiits (analogous to setting optimization_level=3).

However, there are plenty of occasions where one doesn't want to wait to optimize the circuits to the best of the transpiler's ability, and the current PR is valuable for this reason. After all, we should aim to have the simple case be quick.

Note there are some limitations to this PR's approach, times at which conversion to DAGCircuit and back is probably going to be necessary, even if one does not want to optimize at the highest level. Ones I can think of right now:

  • if ancilla qubits are added during the QPD decomposition
  • if we support QPDs where some of the subexperiments can have gates with more than one qubit

Right now, we don't support either case, but maybe we will have to this about this in the future, e.g. if we want to provide direct support for cutting Toffoli gates (#258 (comment)).

Instead of {"heron", "eagle"}, we could take the Backend directly and key on its sorted(backend.basis_gates) to find the right equivalence library.

EDIT: This is kind of a bad idea. Using these two strings makes everything very straightforward. The info we need is being passed explicitly from the user. Passing in a backend means we have to account for all the different backend classes and make sure we are extracting their basis gate sets correctly and making sure we have accounted for all the permutations of basis gates across Runtime, qiskit provider, and Qiskit generic backends.

Actually, IMO taking the Backend directly would actually be preferable. Needing to pass the backend type as a string violates the don't-repeat-yourself philosophy and forces the user to know that ibm_sherbrooke is an "eagle" backend and ibm_torino is a "heron" backend, among others. And if they switch backends, they will now have two places to update in the code. (I could see this leading to very annoying errors when one forgets to modify both places, or they otherwise go out-of-sync.)

If the target instruction set of the backend matches something we recognize, then we can just use the equivalence library in this PR. If it does not match something we recognize, we can warn the user that the slow code path is being used, then allow Qiskit to do the basis translation. This way, the user will always get something that works, but will benefit from the improved performance on backends that we explicitly have written optimized equivalence libraries for.

And maybe eventually some of this work can influence Qiskit, such that we don't even have to have our own equivalences here and can just rely on the ones in Qiskit to be as good as ours. Some day.

@garrison
Copy link
Member

In a conversation with @mtreinish, he suggested to me that we should be able to use the standard equivalence library provided by Qiskit. If it is lacking some desired equivalences, then we can always copy the object and add our own equivalances to our library's copy, then use that as we see fit. Or, if there are equivalences that are not in Qiskit, they are likely to be welcome upstream too. Either way, this allows us to avoid using the BasisTranslator pass.

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented May 13, 2024

In a conversation with @mtreinish, he suggested to me that we should be able to use the standard equivalence library provided by Qiskit. If it is lacking some desired equivalences, then we can always copy the object and add our own equivalances to our library's copy, then use that as we see fit. Or, if there are equivalences that are not in Qiskit, they are likely to be welcome upstream too. Either way, this allows us to avoid using the BasisTranslator pass.

How can we use the sel if the returned gates are in the standard gate set? That is the problem we are trying to correct. Our QPD gates are also in that gate set. This PR defines a new equivalence library -- one for each architecture we support. We don't need to build any optimization on top of this library because we just define one translation per gate we use in QPD, so we just have O(1) translation of our gates as they come out of the QPDBasis objects and into our subexperiments

@garrison
Copy link
Member

In a conversation with @mtreinish, he suggested to me that we should be able to use the standard equivalence library provided by Qiskit. If it is lacking some desired equivalences, then we can always copy the object and add our own equivalances to our library's copy, then use that as we see fit. Or, if there are equivalences that are not in Qiskit, they are likely to be welcome upstream too. Either way, this allows us to avoid using the BasisTranslator pass.

How can we use the sel if the returned gates are in the standard gate set? That is the problem we are trying to correct. Our QPD gates are also in that gate set. This PR defines a new equivalence library -- one for each architecture we support. We don't need to build any optimization on top of this library because we just define one translation per gate we use in QPD, so we just have O(1) translation of our gates as they come out of the QPDBasis objects and into our subexperiments

Sorry, after talking to @mtreinish, I had thought the Dijkstra's algorithm search was in EquivalenceLibrary, but this is wrong; it looks like it is only in BasisTranslator. The idea was that if the direct mapping we want is in the EquivalenceLibrary, then it will still take O(1) time because it will avoid the Dijkstra search. This part would still be possible if we were using BasisTranslator with an improved equivalence library, but that would require a DAGCircuit conversion and back.

@mtreinish
Copy link
Member

mtreinish commented May 18, 2024

If you're trying to avoid using the basis translator and the dag then creating an equivalence library is not needed. You should just use a python dictionary. Under the covers the advantage the EquivalenceLibrary class provides you is that it builds a rustworkx graph of all the equivalence relationships in the library which makes things like doing a Dijkstra search to find a translation between arbitrary gates easy. If you're just looking for a static lookup map of 1:1 mappings, then the extra machinery in the EquivalenceLibrary class is unnecessary overhead. You can just do something like:

eagle_y_circ = QuantumCircuit(1)
eagle_y.rz(math.pi, 0)
eagle_y.x(0)
eagle_translation_dict = {
    "y": eagle_y
}

which will be faster and simpler. Also the other thing you can do is use the synthesis library to generate the definitions if you're not confident in the decomposition being good. Something like:

from qiskit.synthesis import OneQubitEulerDecomposer,  TwoQubitBasisDecomposer
from qiskit.circuit.library import ECRGate, CZGate, YGate

one_qubit_decomp = OneQubitEulerDecomposer("ZSXX")
eagle_decomp_two = TwoQubitBasisDecomposer(ECRGate(), euler_basis="ZSXX")
heron_decomp_two = TwoQubitBasisDecomposer(CZGate(), euler_basis="ZSXX")

eagle_translation_dict = {
    "y": one_qubit_decomp(YGate()),
    "cz": eagle_decomp_two(CZGate()),
}
heron_translation_dict = {
    "y": eagle_translation_dict["y"],
   "ecr": heron_decomp_two(ECRGate()),
}

This won't always give the optimal decomposition, but it simplifies it as much as the transpiler can by default. For example, SwapGate -> CZGate can done as:

def_swap_cz = QuantumCircuit(q, global_phase=-pi / 2)
def_swap_cz.sx(0)
def_swap_cz.sx(1)
def_swap_cz.cz(0, 1)
def_swap_cz.sx(0)
def_swap_cz.sx(1)
def_swap_cz.cz(0, 1)
def_swap_cz.sx(0)
def_swap_cz.sx(1)
def_swap_cz.cz(0, 1)

but the decomposition from TwoQubitBasisDecomposer has a few extra 1q gates over that.

@caleb-johnson
Copy link
Collaborator Author

caleb-johnson commented May 20, 2024

If you're trying to avoid using the basis translator and the dag then creating an equivalence library is not needed.

Thanks Matthew, the reason I didn't do that is because the parameters pass through nicely from the input gate to the translated representation. I'm sure that functionality can be recreated w a normal Python dict, but I'd have to think how.

EDIT: I think this is actually really easy. I just hadn't used python dicts like this before :)

@caleb-johnson caleb-johnson modified the milestones: 0.8.0, 0.9.0 May 22, 2024
@@ -214,7 +224,13 @@ def _decompose_qpd_instructions(
for data in inst.operation.definition.data:
# Can ignore clbits here, as QPDGates don't use clbits directly
assert data.clbits == ()
tmp_data.append(CircuitInstruction(data.operation, qubits=[qubits[0]]))
if basis_gate_set is None or data.operation.name in {"qpd_measure"}:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that comes out of QPDBasis can be translated except our measurement objects. Those never need to be translated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for translation into a hardware-native gate set during creation of subexperiments
5 participants