Performance Tips

Use the correct block types

put v.s. subroutine

While both blocks maps a subblock to a subset of qudits, their implementations are purposes are quite different. The put block applies the gate in a in-place manner, which requires the static matrix representation of its subblock. It works the best when the subblock is small.

The subroutine block is for running a sub-program in a subset of qubits. It first sets target qubits as active qubits using the focus! function, then apply the gates on active qubits. Finally, it unsets the active qubits with the relax! function.

julia> using Yao

julia> reg = rand_state(20);

julia> @time apply(reg, put(20, 1:6=>EasyBuild.qft_circuit(6)));  # second run
  0.070245 seconds (1.32 k allocations: 16.525 MiB)

julia> @time apply(reg, subroutine(20, EasyBuild.qft_circuit(6), 1:6));  # second run
  0.036840 seconds (1.07 k allocations: 16.072 MiB)

repeat v.s. put

repeat block is not only an alias of a chain of put, sometimes it can provide speed ups due to the different implementations.

julia> reg = rand_state(20);

julia> @time apply!(reg, repeat(20, X));
  0.002252 seconds (5 allocations: 656 bytes)

julia> @time apply!(reg, chain([put(20, i=>X) for i=1:20]));
  0.049362 seconds (82.48 k allocations: 4.694 MiB, 47.11% compilation time)

Other gates accelerated by repeat include: X, Y, Z, S, T, Sdag, and Tdag.

Diagonal matrix in time_evole

Register storage

One can use transposed storage and normal storage for computing batched registers. The transposed storage is used by default because it is often faster in practice. One can use transpose_storage to convert the storage.

Multithreading

Multithreading can be switched on by starting Julia in with a global environment variable JULIA_NUM_THREAD

$ JULIA_NUM_THREAD=4 julia xxx.jl

Check the Julia Multi-Treading manual for details.

GPU backend

The GPU backend is supported in CuYao.

julia> using Yao, CuYao

julia> reg = CuYao.cu(rand_state(20));

julia> circ = Yao.EasyBuild.qft_circuit(20);

julia> apply!(reg, circ)
ArrayReg{2, ComplexF64, CuArray...}
    active qubits: 20/20
    nlevel: 2