knowm.ai

kt-ram · memristors · differential-pair · noise · derivation

kT-bit Read Noise, Derived

The full derivation behind Chapter 3b's read-noise equations: where the thermal term, the flicker term, and their match to the Beta posterior come from.

By Alex Nugent ·

This is the companion derivation for Chapter 3b: The Thermodynamic Bit, Derived. That chapter states three results and points here for the work behind them:

σthermal    kBTVm,σflicker    1w2m,σw  =  σthermal2+σflicker2.\sigma_{\text{thermal}} \;\propto\; \frac{\sqrt{k_B T}}{V\sqrt{m}}, \qquad \sigma_{\text{flicker}} \;\propto\; \frac{1 - w^2}{\sqrt{m}}, \qquad \sigma_w \;=\; \sqrt{\sigma_{\text{thermal}}^2 + \sigma_{\text{flicker}}^2}.

No calculus is assumed. Every step is spelled out, and the goal throughout is to show that those dependences — 1/V, 1/√m, (1 − w²) — are not assumed. They fall out of the physics, and only afterward do we recognize them.

The setup#

A kT-bit stores a weight in two conductances, Ga and Gb. The weight is their balance:

w=GaGbGa+Gb.w = \frac{G_a - G_b}{G_a + G_b}.

Name the bottom — the sum of the two conductances — the magnitude:

m=Ga+Gb.m = G_a + G_b.

So w = (Ga − Gb)/m. We drive the pair with a read voltage V (top at +V, bottom at −V), measure the junction voltage Vy, and report the weight as the normalized read y = Vy / V, which lands in [−1, 1] no matter how hard we drove it. A clean read returns y = w. A real read returns y = w + noise. Two sources set that noise, and we take them one at a time.

The thermal term#

Johnson–Nyquist noise is the hiss every resistance carries simply from being above absolute zero. To find it on our read, we need two things: the resistance the junction node sees, and how that resistance turns into noise.

The resistance at the node. Here is a point worth slowing down on, because it looks backwards at first. For the signal, the two devices are in series: they form the voltage divider, and Vy = V(Ga − Gb)/(Ga + Gb) is the series-divider result. But Johnson noise depends on the resistance the node sees looking out toward ground, and for that the two devices are in parallel.

The reason is the standard rule for finding the impedance at a node: an ideal voltage source has zero internal resistance, so to ask “what does a small fluctuation at the node see?” you replace each driven rail with a short to ground (you “kill the sources”). Do that here and Ga’s far end — once at +V — is now grounded, and Gb’s far end — once at −V — is grounded too. Both devices now run from the node straight to ground, so they sit in parallel. The rails being stiff, low-impedance drives is exactly what lets a fluctuation treat them as ground.

Parallel conductances add, so the conductance seen at the node is Ga + Gb = m, and the resistance is its inverse:

R=1m.R = \frac{1}{m}.

(This is the familiar result that a voltage divider’s output resistance is the two legs in parallel, Ra ∥ Rb.) A bigger pair is a lower resistance at the node.

Resistance into noise. Johnson’s law says a resistance R at temperature T carries a voltage-noise variance proportional to kT·R (the full form is 4·kT·R·Δf over a measurement bandwidth Δf; the bandwidth is a fixed property of the read, so we fold it into the constant). Substituting R = 1/m, the variance on the junction voltage goes as kT/m, and the noise itself — the square root — as:

σVy    kBTm.\sigma_{V_y} \;\propto\; \sqrt{\frac{k_B T}{m}}.

Carry it back to the weight. We do not report Vy; we report y = Vy / V. Dividing the node noise by the read voltage V:

σthermal  =  σVyV    kBTVm.\sigma_{\text{thermal}} \;=\; \frac{\sigma_{V_y}}{V} \;\propto\; \frac{\sqrt{k_B T}}{V\sqrt{m}}.

Two things to read off it. The term falls as 1/√m, so a bigger pair reads quieter. And it falls as 1/V: the Johnson noise on the node does not care how hard we drive the pair, but the signal grows with V, so reading harder shrinks the noise relative to the signal. That 1/V is the read-voltage knob — the one dial we get to turn. Notice there is no w in this term at all: thermal noise is a flat floor, the same height across the whole weight range.

The flicker term#

The second source is flicker (also called 1/f) noise: the conductances slowly wander as charge traps in the material fill and empty. In a real memristor it is the louder of the two. It has one defining feature for us:

Each device wiggles by a fixed fraction of itself, not a fixed amount.

So Ga does not jiggle by some absolute amount — it jiggles by a small percentage of Ga, and Gb by the same percentage of Gb. Call that fraction ε (a small number, say 0.01 for 1%):

δGa=εGa,δGb=εGb.\delta G_a = \varepsilon\, G_a, \qquad \delta G_b = \varepsilon\, G_b.

This is the only physical input, and it says nothing about w. We have not assumed anything about how the noise depends on the weight. That dependence is what we are about to discover.

The question: when Ga and Gb each wiggle a little, how much does w wiggle? That wiggle in w is the flicker noise on the weight.

Step 1 — how sensitive is w to each device?#

If we nudge Ga by a tiny bit, how much does w move? That “amount w moves per unit nudge of Ga” is the sensitivity of w to Ga. (In calculus this is the partial derivative; read it as “responsiveness.”) For the ratio w = (Ga − Gb)/(Ga + Gb), the two sensitivities are:

wGa=2Gbm2,wGb=2Gam2.\frac{\partial w}{\partial G_a} = \frac{2 G_b}{m^2}, \qquad \frac{\partial w}{\partial G_b} = -\frac{2 G_a}{m^2}.

Where they come from: bump Ga, and the top (Ga − Gb) rises by the bump while the bottom (Ga + Gb) rises by the same bump; working through how a fraction responds to that gives 2Gb/m². Bump Gb, and the top falls while the bottom rises — both push w the same way — giving −2Ga/m². The minus sign just means raising Gb lowers w, which is right, since Gb is the negative side.

The point to hold onto: there is no 1 − w² anywhere in these. They are built only from Ga, Gb, and m.

Step 2 — each device’s contribution to the wiggle#

The wiggle in w from one device is (sensitivity) × (size of that device’s wiggle). From Ga:

2Gbm2εGa.\frac{2 G_b}{m^2} \cdot \varepsilon G_a.

From Gb:

2Gam2εGb.\frac{2 G_a}{m^2} \cdot \varepsilon G_b.

(The minus sign drops away next, because we are about to square — direction stops mattering, only size does.)

Step 3 — combine the two wiggles in quadrature#

The two devices hiss independentlyGa’s wiggle has nothing to do with Gb’s. Independent random wiggles do not just add; you add their squares and take the square root at the end. (Same rule as the Pythagorean theorem: two independent legs combine through a² + b², not a + b. This is “adding in quadrature.”)

σflicker2=(2Gbm2εGa)2+(2Gam2εGb)2.\sigma_{\text{flicker}}^2 = \left(\frac{2 G_b}{m^2}\,\varepsilon G_a\right)^2 + \left(\frac{2 G_a}{m^2}\,\varepsilon G_b\right)^2.

Square each piece and both terms come out identical, each 4 ε² Ga² Gb² / m⁴:

σflicker2=8ε2Ga2Gb2m4.\sigma_{\text{flicker}}^2 = \frac{8\,\varepsilon^2 G_a^2 G_b^2}{m^4}.

Take the square root (√8 = 2√2):

σflicker=22εGaGbm2.\sigma_{\text{flicker}} = \frac{2\sqrt{2}\,\varepsilon\, G_a G_b}{m^2}.

Step 4 — look at what fell out#

The combination that landed in the result, on its own, is:

GaGbm2.\frac{G_a G_b}{m^2}.

We never put it there. It is a consequence of how the ratio w responds when you jiggle its two inputs. If the math had produced something lopsided like Ga²/m², we would be stuck with that. It produced exactly Ga·Gb/m².

Step 5 — the identity that translates it#

Now — and only now — recall a piece of pure algebra. Take 1 − w²:

1w2=(Ga+Gb)2(GaGb)2(Ga+Gb)2.1 - w^2 = \frac{(G_a + G_b)^2 - (G_a - G_b)^2}{(G_a + G_b)^2}.

Expand the top. The Ga² terms cancel, the Gb² terms cancel, and the cross terms add: (A+B)² − (A−B)² = 4AB for any A and B. So:

1w2=4GaGbm2.1 - w^2 = \frac{4 G_a G_b}{m^2}.

This identity is inert on its own — a true statement claiming nothing about noise. Its only job is to act as a dictionary.

Step 6 — use the dictionary#

Rearrange: Ga·Gb/m² = (1 − w²)/4. Substitute into the Step 3 result — not to create a (1 − w²), but to rename the Ga·Gb/m² the physics already produced:

σflicker=22ε1w24=ε2(1w2).\sigma_{\text{flicker}} = 2\sqrt{2}\,\varepsilon \cdot \frac{1 - w^2}{4} = \frac{\varepsilon}{\sqrt{2}}\,(1 - w^2).

We did not assume (1 − w²). The noise propagation produced Ga·Gb/m², and the algebraic identity translated it into the language of w.

Read the (1 − w²) straight off. At w = 0 (balanced pair), 1 − 0 = 1: noise is loudest, because the difference Ga − Gb sits near zero and tiny wiggles shove a near-zero number around enormously in relative terms. At w = ±1 (one side dominates), 1 − 1 = 0: noise vanishes, because w is pinned against its ceiling and can barely move. Loud in the middle, silent at the rails.

Where the 1/√m comes from#

Everything above gives the (1 − w²) factor exactly. It does not, by itself, give the 1/√m. That second factor enters through one more physical input: the fractional flicker amplitude ε itself shrinks as the pair accumulates conductance. More conductance means more independent microscopic fluctuators contributing, and independent fluctuators average down as 1/√N — the same square-root-of-count law as the quadrature in Step 3. Since the magnitude m is that count, ε ∝ 1/√m, which gives:

σflicker    1w2m.\sigma_{\text{flicker}} \;\propto\; \frac{1 - w^2}{\sqrt{m}}.

This is the form Chapter 3b uses. The (1 − w²) is exact from the ratio; the 1/√m is a well-justified modeling input layered on top. And note: flicker is a fractional fluctuation of the conductances themselves, so it carries no factor of V. Reading harder does not quiet it — only more evidence does.

The total#

The thermal and flicker sources are physically independent, so they too combine in quadrature:

σw=σthermal2+σflicker2    kBTV2m+(1w2)2m.\sigma_w = \sqrt{\sigma_{\text{thermal}}^2 + \sigma_{\text{flicker}}^2} \;\propto\; \sqrt{\frac{k_B T}{V^2\, m} + \frac{(1 - w^2)^2}{m}}.

Both terms carry the 1/√m, so the whole read quiets as the pair gathers evidence. The thermal term is a flat floor whose height you set with V; the flicker term adds the (1 − w²) bulge on top, widest at the balance point and closing toward the rails. Raise V and the thermal floor drops toward the flicker bulge; drop V and the floor rises above it. That is the read-voltage dial.

Why this is the shape of the Beta posterior#

Chapter 3b reads the same two conductances as a Beta distribution : the a-side count is α = Ga, the b-side is β = Gb. Three numbers come off that distribution:

μ=αα+β,κ=α+β,σμ2=μ(1μ)κ+1.\mu = \frac{\alpha}{\alpha+\beta}, \qquad \kappa = \alpha + \beta, \qquad \sigma_\mu^2 = \frac{\mu(1-\mu)}{\kappa + 1}.

Here is the variable dictionary, the part that is easy to lose track of:

Beta termkT-bit termmeaning
mean μweight, as w = 2μ − 1which way the belief leans
concentration κmagnitude m = Ga + Gbhow much evidence is behind it
variance σ²_μspread of the beliefhow unsure it still is

Two substitutions line the belief’s spread up with the device’s read noise.

First, the lean. Since μ = (1 + w)/2 and 1 − μ = (1 − w)/2, their product is:

μ(1μ)=(1+w)(1w)4=1w24.\mu(1-\mu) = \frac{(1+w)(1-w)}{4} = \frac{1 - w^2}{4}.

The same (1 − w²) the flicker derivation produced is exactly the Beta’s μ(1 − μ), rescaled. The belief is widest at the balance point and collapses toward the certain ends, the same way the read noise does.

Second, the evidence. The Beta’s concentration κ is the magnitude m, so its variance carries the 1/(κ + 1) = 1/(m + 1) that narrows as evidence climbs. Putting both into the belief’s standard deviation on w (which is 2σ_μ, since w = 2μ − 1):

σwbelief=2σμ=1w2m+1    1w2m.\sigma_w^{\text{belief}} = 2\sigma_\mu = \frac{\sqrt{1 - w^2}}{\sqrt{m + 1}} \;\approx\; \frac{\sqrt{1 - w^2}}{\sqrt{m}}.

Compare the two side by side. The device’s read scatter and the Beta belief’s spread both fall as 1/√m with evidence, and both close toward the rails through a (1 − w²) factor. The match is in the form, not the exact exponents: the flicker term carries (1 − w²) where the belief carries √(1 − w²), and the thermal term adds a w-independent floor the pure belief does not have. So the device read scatters the way its belief is spread — narrowing with evidence, tightening toward a decision — without that being an exact algebraic identity. The device samples something with the same two-way dependence as its own posterior — on evidence and on the lean — and the randomness is the device’s own physical noise.

In one paragraph#

A read returns the stored weight plus noise from two sources. Thermal noise sets a flat floor √(kT)/(V√m) — quieter for a bigger pair, and tunable through the read voltage V. Flicker noise wiggles each conductance by a fixed fraction of itself; propagating those two independent wiggles through the ratio w = (Ga − Gb)/(Ga + Gb) produces 2√2·ε·Ga·Gb/m², and the algebraic identity 1 − w² = 4Ga·Gb/m² renames that as (1 − w²), with a further 1/√m from the fractional amplitude falling with evidence. The two add in quadrature. The result narrows the same two ways the Beta posterior does — with evidence, and toward the rails — so a read is a sample whose width tracks the pair’s own confidence.