Quantum Circuit Born Machine
Yao is designed with variational quantum circuits in mind, and this tutorial will introduce how to use Yao for this kind of task by implementing a quantum circuit born machine described in Jin-Guo Liu, Lei Wang (2018)
let's use the packages first
using Yao, LinearAlgebra, Plots
Training Target
In this tutorial, we will ask the variational circuit to learn the most basic distribution: a guassian distribution. It is defined as follows:
\[f(x \left| \mu, \sigma^2\right) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
We implement it as gaussian_pdf
:
function gaussian_pdf(x, μ::Real, σ::Real)
pl = @. 1 / sqrt(2pi * σ^2) * exp(-(x - μ)^2 / (2 * σ^2))
pl / sum(pl)
end
pg = gaussian_pdf(1:1<<6, 1<<5-0.5, 1<<4);
We can plot the distribution, it looks like
Plots.plot(pg)
Create the Circuit
A quantum circuit born machine looks like the following:
It is composited by two different layers: rotation layer and entangler layer.
Rotation Layer
Arbitrary rotation is built with Rotation Gate on Z, X, Z axis with parameters.
\[Rz(\theta) \cdot Rx(\theta) \cdot Rz(\theta)\]
Since our input will be a $|0\dots 0\rangle$ state. The first layer of arbitrary rotation can just use $Rx(\theta) \cdot Rz(\theta)$ and the last layer of arbitrary rotation could just use $Rz(\theta)\cdot Rx(\theta)$
In 幺, every Hilbert operator is a block type, this ncludes all quantum gates and quantum oracles. In general, operators appears in a quantum circuit can be divided into Composite Blocks and Primitive Blocks.
We follow the low abstraction principle and thus each block represents a certain approach of calculation. The simplest Composite Block is a Chain Block, which chains other blocks (oracles) with the same number of qubits together. It is just a simple mathematical composition of operators with same size. e.g.
\[\text{chain(X, Y, Z)} \iff X \cdot Y \cdot Z\]
We can construct an arbitrary rotation block by chain $Rz$, $Rx$, $Rz$ together.
chain(Rz(0.0), Rx(0.0), Rz(0.0))
nqubits: 1
chain
├─ rot(Z, 0.0)
├─ rot(X, 0.0)
└─ rot(Z, 0.0)
Rx
, Rz
will construct new rotation gate, which are just shorthands for rot(X, 0.0)
, etc.
Then let's chain them up
layer(nbit::Int, x::Symbol) = layer(nbit, Val(x))
layer(nbit::Int, ::Val{:first}) = chain(nbit, put(i=>chain(Rx(0), Rz(0))) for i = 1:nbit);
We do not need to feed the first n
parameter into put
here. All factory methods can be lazy evaluate the first arguements, which is the number of qubits. It will return a lambda function that requires a single interger input. The instance of desired block will only be constructed until all the information is filled. When you filled all the information in somewhere of the declaration, 幺 will be able to infer the others. We will now define the rest of rotation layers
layer(nbit::Int, ::Val{:last}) = chain(nbit, put(i=>chain(Rz(0), Rx(0))) for i = 1:nbit)
layer(nbit::Int, ::Val{:mid}) = chain(nbit, put(i=>chain(Rz(0), Rx(0), Rz(0))) for i = 1:nbit);
Entangler
Another component of quantum circuit born machine are several CNOT operators applied on different qubits.
entangler(pairs) = chain(control(ctrl, target=>X) for (ctrl, target) in pairs);
We can then define such a born machine
function build_circuit(n, nlayers, pairs)
circuit = chain(n)
push!(circuit, layer(n, :first))
for i in 2:nlayers
push!(circuit, cache(entangler(pairs)))
push!(circuit, layer(n, :mid))
end
push!(circuit, cache(entangler(pairs)))
push!(circuit, layer(n, :last))
return circuit
end
build_circuit (generic function with 1 method)
We use the method cache
here to tag the entangler block that it should be cached after its first run, because it is actually a constant oracle. Let's see what will be constructed
build_circuit(4, 1, [1=>2, 2=>3, 3=>4])
nqubits: 4
chain
├─ chain
│ ├─ put on (1)
│ │ └─ chain
│ │ ├─ rot(X, 0.0)
│ │ └─ rot(Z, 0.0)
│ ├─ put on (2)
│ │ └─ chain
│ │ ├─ rot(X, 0.0)
│ │ └─ rot(Z, 0.0)
│ ├─ put on (3)
│ │ └─ chain
│ │ ├─ rot(X, 0.0)
│ │ └─ rot(Z, 0.0)
│ └─ put on (4)
│ └─ chain
│ ├─ rot(X, 0.0)
│ └─ rot(Z, 0.0)
├─ [cached] chain
│ ├─ control(1)
│ │ └─ (2,) X
│ ├─ control(2)
│ │ └─ (3,) X
│ └─ control(3)
│ └─ (4,) X
└─ chain
├─ put on (1)
│ └─ chain
│ ├─ rot(Z, 0.0)
│ └─ rot(X, 0.0)
├─ put on (2)
│ └─ chain
│ ├─ rot(Z, 0.0)
│ └─ rot(X, 0.0)
├─ put on (3)
│ └─ chain
│ ├─ rot(Z, 0.0)
│ └─ rot(X, 0.0)
└─ put on (4)
└─ chain
├─ rot(Z, 0.0)
└─ rot(X, 0.0)
MMD Loss & Gradients
The MMD loss is describe below:
\[\begin{aligned} \mathcal{L} &= \left| \sum_{x} p \theta(x) \phi(x) - \sum_{x} \pi(x) \phi(x) \right|^2\\ &= \langle K(x, y) \rangle_{x \sim p_{\theta}, y\sim p_{\theta}} - 2 \langle K(x, y) \rangle_{x\sim p_{\theta}, y\sim \pi} + \langle K(x, y) \rangle_{x\sim\pi, y\sim\pi} \end{aligned}\]
We will use a squared exponential kernel here.
struct RBFKernel
σ::Float64
m::Matrix{Float64}
end
function RBFKernel(σ::Float64, space)
dx2 = (space .- space').^2
return RBFKernel(σ, exp.(-1/2σ * dx2))
end
kexpect(κ::RBFKernel, x, y) = x' * κ.m * y
kexpect (generic function with 1 method)
There are two different way to define the loss:
In simulation we can use the probability distribution of the state directly
get_prob(qcbm) = probs(zero_state(nqubits(qcbm)) |> qcbm)
function loss(κ, c, target)
p = get_prob(c) - target
return kexpect(κ, p, p)
end
loss (generic function with 1 method)
Or if you want to simulate the whole process with measurement (which is entirely physical), you should define the loss with measurement results, for convenience we directly use the simulated results as our loss
Gradients
the gradient of MMD loss is
\[\begin{aligned} \frac{\partial \mathcal{L}}{\partial \theta^i_l} &= \langle K(x, y) \rangle_{x\sim p_{\theta^+}, y\sim p_{\theta}} - \langle K(x, y) \rangle_{x\sim p_{\theta}^-, y\sim p_{\theta}}\\ &- \langle K(x, y) \rangle _{x\sim p_{\theta^+}, y\sim\pi} + \langle K(x, y) \rangle_{x\sim p_{\theta^-}, y\sim\pi} \end{aligned}\]
which can be implemented as
function gradient(qcbm, κ, ptrain)
n = nqubits(qcbm)
prob = get_prob(qcbm)
grad = zeros(Float64, nparameters(qcbm))
count = 1
for k in 1:2:length(qcbm), each_line in qcbm[k], gate in content(each_line)
dispatch!(+, gate, π/2)
prob_pos = probs(zero_state(n) |> qcbm)
dispatch!(-, gate, π)
prob_neg = probs(zero_state(n) |> qcbm)
dispatch!(+, gate, π/2) # set back
grad_pos = kexpect(κ, prob, prob_pos) - kexpect(κ, prob, prob_neg)
grad_neg = kexpect(κ, ptrain, prob_pos) - kexpect(κ, ptrain, prob_neg)
grad[count] = grad_pos - grad_neg
count += 1
end
return grad
end
gradient (generic function with 1 method)
Now let's setup the training
import Optimisers
qcbm = build_circuit(6, 10, [1=>2, 3=>4, 5=>6, 2=>3, 4=>5, 6=>1])
dispatch!(qcbm, :random) # initialize the parameters
κ = RBFKernel(0.25, 0:2^6-1)
pg = gaussian_pdf(1:1<<6, 1<<5-0.5, 1<<4);
opt = Optimisers.setup(Optimisers.ADAM(0.01), parameters(qcbm));
function train(qcbm, κ, opt, target)
history = Float64[]
for _ in 1:100
push!(history, loss(κ, qcbm, target))
ps = parameters(qcbm)
Optimisers.update!(opt, ps, gradient(qcbm, κ, target))
dispatch!(qcbm, ps)
end
return history
end
history = train(qcbm, κ, opt, pg)
trained_pg = probs(zero_state(nqubits(qcbm)) |> qcbm)
64-element Vector{Float64}:
0.004508977358615008
0.003975216566920962
0.005746123187879088
0.005749329985145581
0.006580664943765955
0.006866068131531995
0.008311317697904488
0.008736294128910029
0.009650801303891883
0.010412444759191704
0.011893511697900344
0.01230144698659008
0.013619372287743545
0.014140933819417556
0.015509107171862139
0.016260584185992348
0.017443608782407298
0.01839524464492458
0.019290292456281822
0.020329428860274903
0.021319218831436223
0.021924746634115408
0.02250385379087287
0.0235540791583426
0.024009631505124942
0.024537851419924893
0.02503757588424172
0.025445261738574053
0.025667087393993208
0.02610193688901716
0.026165739102211406
0.026188321230937416
0.025837130477500216
0.025761210807597305
0.025581855449850616
0.024947940622970382
0.02477388340013024
0.024196457190097262
0.023360001961423408
0.02271014732413477
0.021985065618614248
0.02107170600378278
0.020187375812990502
0.019199511697198978
0.01848531685636374
0.017268319300008543
0.01643630592372164
0.015512773555483935
0.014441562601076804
0.013300491776597196
0.012564302130441287
0.011729789173770646
0.010349679937628323
0.009846534467530666
0.009045354399092372
0.007931150520443976
0.007607588225093095
0.006471820385909556
0.006037116425143407
0.005788003626769444
0.0036900423425330737
0.0048658399295142315
0.00304827149190584
0.003791378028740338
The history of training looks like below
title!("training history")
xlabel!("steps"); ylabel!("loss")
Plots.plot(history)
And let's check what we got
fig2 = Plots.plot(1:1<<6, trained_pg; label="trained")
Plots.plot!(fig2, 1:1<<6, pg; label="target")
title!("distribution")
xlabel!("x"); ylabel!("p")
So within 50 steps, we got a pretty close estimation of our target distribution!
This page was generated using Literate.jl.