knowm.ai

assimilaate · memristors · ai-hardware · edge-ai

Transforms will be Assimilated

Ship synapses. Stream transforms. Forget weights.

By Alex Nugent ·

A synapse is not a weight. It is a physical adaptive communication element. In Knowm world, it’s a memristor pair. A synapse unites memory and processing, and the only way to ship it is via snail mail.

Present-day AI runs on weights, or more precisely on our ability to compute them and move them around. We now spend absurd amounts of energy scraping the internet and human knowledge to iteratively learn what those numbers should be. Then we spend still more energy hauling them back and forth through networks and memory hierarchies so a model can consult that learned experience during inference. After awhile it starts to feel obvious that the weights must be the thing that matters.

Weights don’t matter. They are a means to an end. What matters is transforming one neural activation into another. And while it is true that synaptic weights are largely responsible for mediating the transform, there is no deep reason to communicate weights—not for distribution and not for inference. To head off some of you at the pass I am not saying that AI of the future will work like we do, where every individual is tasked with their own internal learning. It’s far more interesting and powerful than that. Let me give you a metaphor to meditate on and hopefully knock you into another attractor.

OpenAI Playground screenshot illustrating the Theseus's ship paradox as a metaphor for identity and atomic replacement.
You are not the atoms that compose you. You are activations of energy flowing through a an adaptive container.

Do you think that you are (only) the totality of the atoms that compose you? The atoms of that sandwich you had for lunch last week are now partially integrated into your body and will be expelled in a variety of ways, from days to months. Every atom that composes you right now will likely be replaced long before you die. You—and all living things—are Theseus’s Ship sailing from birth to death across the ocean of life. You are activations of energy flowing through a plastic container. When you remove those activations the atoms that temporarily compose your body will disintegrate into a pile. So let me ask you again. Do you think that you are only the atoms that compose you?

Now that we’ve shaken off the card-carrying eliminative materialists, let me assure the rest of you that what you will read in this blog series is nuts-and-bolts. I am not only going to explain how this all works, I will build it and I will put it on the internet. It’s not metaphorical and its not abstract. So let me quickly give you a description on how I believe the future of AI will work. May it serve as a waypoint marker on what might be a wandering (and hopefully interesting) walk in the future articles on this website.

Datacenters, the same ones we have now and likely a few more of them, will spend tremendous amounts of energy integrating more information than you can possibly conceive from every corner of the earth. That information will be computed into models that closely resemble what we have now. These models will be tweaked architecturally from time to time, more now, less later, as we arrive at a description of a near-optimal universal intelligence substrate. The models will be trained continuously, day after day. These models will not be used directly for inference. Their weights will not be copied to inference machines and those inference machines will not shuttle weights back and forth between memory and processing cores. Rather, the transforms will be represented as sparse activation address tuples (ATTs), communicated to billions of edge devices near continuously at a tiny fraction of the bandwidth required to move weights, where they will direct the self organization of a variety physical synaptic substrates of various ages and model numbers via a process I call assimilation. Assimilation will be fast—much faster than downloading weights. It will usually be done at night or in the off hours along side active self-repair and more traditional security updates, but also interweaved with active inference if down time does not permit. Local copies will differentiate slightly to further optimize to their local environment without any supervision. Billions of semi-autonomous and autonomous platforms will gather information of every imaginable flavor and upload whats new back to the cloud, where it will be integrated and disseminated across the fleet before the day is done.

Before I Knew of The Word Memristor#

In 2001 I was a junior undergraduate physics major with a new mission. A year earlier I had stumbled into neural networks after an random walk through my college library. I was frustrated with quantum mechanics, which I was supposed to be studying, so I got up and wandered around. On the third floor I found Jim Jubak’s In the Image of the Brain: Breaking the Barrier Between the Human Mind and Intelligent Machines. It was about neuromorphic computing, neural networks, and the idea that brains were not just digital computers running better software. They were a different kind of machine.

I became obsessed. I read everything I could find, learned about backpropagation of error, programmed small neural networks, and took any course with “Neuro” in the title. The conclusion I came to: the separation of memory and processing is going to be the key barrier to future AI. A brain did not shuttle bits back and forth between a memory bank and an arithmetic unit. A brain did it all in the same physical structure at the same time. That structure that seemingly allowed that to happen without burning gigawatts was the synapse. So I started to wonder how I could build one.

My first idea for building a physical neural network was to somehow shatter glass and fill the cracks with conductive material. I pictured one of those crystal balls an oracle might hold, fractured into networks. It was a beautiful but not-even-half-baked idea, and I had no idea whatsoever how to make it work. I quickly abandoned it for another idea that I could wrap my little undergraduate physics head around. I only mention it as a sort of foreshadowing of what’s to come—one of those strange premonitions that makes you think reality and time may not work exactly like we think.

My second idea for an artificial synapse idea was a resistance-changing connection made from nanoparticles in a colloidal suspension between electrodes on an integrated chip. Apply a voltage across a gap, induce dipoles in the particles, pull them into the gap, and form a bridge. The bridge would change the resistance, the resistance would be the synaptic weight, and the act of applying voltage would be the act of adaptation. I did not know the word memristor yet. I did not know Leon Chua had hypothesized a missing circuit element almost a decade before I was born. I was just an enthusiastic physics student with a growing suspicion that adaptive resistance was the physical primitive I wanted.

Screenshot from OpenAI Playground describing an early nanoparticle-based artificial synapse concept.
My early physical network and synapse ideas

I kept asking my mother (Knowm Inc partner Hillary Riggs) why nobody was doing this. After many months of asking the same question, she said something to the effect of: “shut up and do it.” Let’s get a patent. Patent attorneys Kermit Lopez and Luis Ortiz joined shortly after as partners and the first Knowm patents were born, with many more to come. While those first patents are now expired as it has been over 20 years (!!), we filed the latest Knowm patent yesterday.

The Synapses Were Pretty Shitty#

The first technical problem was obvious: a learned weight needs to go positive and negative, but a single resistance does not become negative. The answer was a differential pair. One resistive element contributes the positive side, one contributes the negative side, and the signed synaptic value lives in the difference. Differential pair memristor synapses are one of those things that becomes obvious after you see it. The fact that seemingly all of the natural world is formed from adaptive, competing differential energy dissipation pathways had yet to dawn on me—but i’ll be talking a lot more about that. That would come during my time advising the DARPA Physical Intelligence program after Todd Hylton and I successfully launched the DARPA SyNAPSE program. But I digress.

The second technical problem with synapses was more confounding. My particle synapses were volatile, noisy, unstable, and annoying in all the ways actual physical devices are annoying (in particular small ones). For a while I tried to ignore it, to treat it as a defect and ‘sweep it under the rug’, but then I realized biology had the same problem. Biological synapses are not precision components. They are sketchy little adaptive structures in a dynamic (living) system that is constantly repairing itself. If the brain is volatile, and it obviously is, then intelligence must be compatible with volatility. Heck, it might require it. To be alive is almost, by definition, to be volatile. Maybe repair in the face of volatility is not a side effect of intelligence. Maybe that is intelligence. These were the ideas being floated while I advised the DARPA SyNAPSE and Physical Intelligence programs and from which Todd and his growing team of SETAs tried—and failed—to launch the DARPA Thermodynamic Computing program.

My history in this line of thought goes back to my work at Los Alamos National Laboratory (LANL) with Anti-Hebbian and Hebbian (AHaH) plasticity. After graduating physics as an undergraduate I spent a year at LANL and found myself on project related to future nano-scale computing system. I worked with Reid Porter (Space Data System) and Garret Kenyon (Computational Neuroscience), and my research was heavily influenced by my seemingly shitty synapses. I trained neural network classifiers, damaged their weights, connections and neurons, and then tested synaptic plasticity rules to see if any of them could repair the learned state while the network operated. Almost all of the rules failed. Some drove the weights to infinity. Some drove them to zero. Some oscillated. Some destroyed the classifier and marched performance straight to random choice. Then one actually worked, and we proceeded to work backwards from that.

OpenAI Playground screenshot illustrating AHaH plasticity and volatile synapse repair.
Unsupervised AHaH plasticity activly repairing synaptic weights to maintain an optimal classification boundary.

That was the moment I stopped thinking of volatility as merely a hardware problem. Volatility was a clue and I had what appeared to be a tiny foot-hold on a solution. If the devices could adapt locally via plasticity rules operating on structured information, then the hardware did not need to be perfect. Not only that, but the exact state of the synapses at any given time did not really matter because the optimal set of weights was determined as much by the physical state of the network as the information it was processing. If the neural state recovered after the damage because the weights adapted to compensate—then the specific weights don’t matter. AHaH plasticity was acting to keep the neuron in its attractor—an attractor defined by the structure of the information and how the neurons tranfer function related to that information—not by its weights. Hence the weights changed and the neurons transform was repaired. The information for self-assembly is in the datastream.

DARPA: SyNapse, Physical Intelligence and Thermodynamic Computing#

There is a much longer story here, and I have told part of the first decade of it here: Knowm History 2001-2011. The short version is that after dropping out of graduate school (PhD EE), moving back to Santa Fe New Mexico, and trying to keep the whole thing alive with a PC and stubbornness, Hillary and I eventually found our way to Washington, DC. We met with the Office of Naval Research. We met with the Small Business office of the National Science Foundation. I gave a talk at the patent office. The NSF meeting was, shall we say, “character-building”. I was told by the National Science Foundation that “we don’t fund science projects” and to get out of his office. I will not say his last name but it sounds somewhat like “Arch Dick”. That was my first hard lesson in the politics of innovation: While the idea is a necessary component it is very much up for grabs by the belt-way-bandits. The rest is access and status. People for the most part don’t listen to the message unless it comes from the right messenger. The information is not valued. Rather, its the container of the information that people value. Which is why our last meeting on that trip was ultimately the most important.

My presentation to the Atlantic Nano Forum at the US. Patent Office on December 6th, 2005.

Our last stop was to meet Todd Hylton, who was then a director at the nanotechnology division of SAIC. Todd listened and had the physics background to understand what I was saying. He also seemed to have that same sense that ‘something else is out there’ and he was searching for it. He had the institutional position to walk into rooms that would not take me seriously. We could both say exactly the same words but wheres I would be ridiculed and denigrated he might command at least a moment of attention. I remember his words very clearly. “I am having a hard time right now determining if you are a genius or insane, but I’d like to keep working with you to better understand this”. And we did. That collaboration with Todd led to his hiring at DARPA as a program manager by then directory Tony Tether for what became the Systems of Neuromorphic Adaptive Plastic Scalable Electronics program, better known as DARPA SyNAPSE. In 2008, with HP co-announcing that it had “discovered” the memristor, the SyNAPSE program kicked off with HP, IBM, and HRL as major performers. I spent the next four years on the government advisory team watching millions of dollars being spent with top US research companies attacking neuromorphic adaptive hardware.

DARPA SyNAPSE program plan briefing slide with the Phase 0 component synapse development hardware deliverable highlighted in the upper left.
DARPA SyNAPSE program plan briefing slide. Note the particle synapse and differential synapses highlighted in the top as Phase 0 hardware deliverables.

I learned a lot. Some of it was technical, some of it was political, and some of it I can’t talk about. It was the kind of education you only get by being close enough to the machine to smell the oil. Some of it was amazing, and some of it stunk. But the core lesson never changed: if we want brain-scale computation in an energy and volume budget that makes sense, we need adaptive physical memory at the site of computation. We need synapses, and those synapses will likely be a little volatile.

Finding the Self Directed Channel Memristor#

DARPA has a rule that was central to its function. Program managers could only stay for about five years. I’m not sure if that is still how it works now. The idea is that DARPA’s role is to “prevent technological surprise”. It has to avoid the calcification that is rampant in other areas of the military industrial complex, where the same people with the same stale ideas become entrenched and control all the resources and direct them to all their same friends. So after those years Todd had to go, and with that all his “SETAs” (Science and Engineering Technical Advisors), which included me.

I met Robinson Pino, then with Air Force Research Labs, on a SyNAPSE program site visit. I forget the exact one—probably IBM in palo alto or HRL labs in Malibu. We talked for a few hours over drinks in a hotel bar. He encouraged me to apply for an SBIR grant and pointed me to the exact solicitation: “VLSI Building Blocks for Future Autonomous Air Vehicles”. After flying back to Santa Fe I got to work. I had never directly applied for government funding and there was a lot of administrative work to do and a proposal to write. I had less than a week. But with Hillary’s help we did it and “M. Alexander Nugent Consulting” was awarded a contract. That set my post-DARPA path. I was going to manage my own small-scale research effort, perhaps finally being able to realize the goal I had embarked on so many years before. I immediately hired my long-time friend and Knowm Inc partner Tim Molter to assist with the effort.

As part of this SBIR program I formalized my thoughts on the volatile synapse with a mathematical model of a memristor. It was an extension of my US patent 7,599,895 called “Methodology for the configuration and repair of unreliable switching elements” (US7599895B2). I needed a mathematical model of a memristor and so I built it from collections of meta-stable switches. I called it the generalized meta stable switch MSS memristor model: The Generalized Metastable Switch Memristor Model.

It was a simple but effective memristor model that was powered at its core by stochastic switching. By tweaking transition probabilities and the number of meta stable switches, one could model a wide range of devices. The SBIR effort was purely theoretical and simulation based, but of course it had to prove that the ideas would map to ultra low power hardware and thus everything had to be built up from physical models of devices. Tim and I first published this work as part of our AHaH Computing paper in 2014, but the ideas that underpinned it had been percolating for a decade.

Artistic AI rendering of the metastable switch memristor model.
Artistic AI rendering of the metastable switch memristor model.

It was at this time that I was made aware of a memristor device from Kris Campbell at Boise State University. I tweaked the parameters of the MSS model to fit an IV curve on a published paper and it seemed to fit nicely. I heard that some folks from AFRL were going to visit Dr. Campbell in Boise and I endeavored to tag along. I wanted to meet the guy. I only discovered once I arrived that he was actually a she. My preconceptions were undoubtedly influenced by the fact that over the whole SyNAPSE and PI programs I think I saw two women performers—and one of those transitioned from a man later. It’s an absurdly male dominated field—for the worse I might add, and Kris immediately stood out to me. Not just because she was a woman but because she had seemingly already accomplished something that so many other groups were flailing to achieve and yet this was the first I had heard of it. Remember my less-than-half-baked idea about building a neural network from “fractured glass”? The devices Kris was fabricating were ion-conducting device built around a Ge-Se glass layer through which silver ions migrate under applied bias to form and dissolve a conductive channel. The fractured glass solid electrolyte layer engineered into the stack provides preferential pathways that confine where the Ag channel forms and reforms across cycles. This confinement is what gives the SDC device its edge compared to the filament formation that limits conventional CBRAM cells. The result is a low-current, low-voltage, high endurance, high yield, high temperature memristor well-suited to the commercialization of analog neuromorphic learning architectures.

Tim and I made rapid progress on the SBIR program, which progressed through phase 1, phase 2 and a phase 2 extension over the years. We also received an additional STTR program with AFRL, and a phase-1 SBIR with the Office of Naval Research. It was an extremely busy and productive time. As part of our work that led to Thermodynamic Ram, Tim and I had completed our goals ahead of schedule and under budget. We had money to spare. I contacted Kris Campbell to see if she could create a memristor specifically suited to AHaH Computing. She said ‘probably’, so that’s where we spent that money. Kris delivered on that “probably” and my consulting company MANC obtained the exclusive world-wide license. This would later be transfered to Knowm Inc. Over a decade later, Knowm SDC memristors are still the only commercially available memristors that researchers can obtain to support memristor research. That time may be coming to an end, but I’ll get to that.

Knowm Inc#

I did not intend to form Knowm Inc from the start. The idea that I could build a singular company that would bring this technology to the world seemed a bit absurd to me. The existing computing technology stack is MASSIVE in its complexity. Layers and layers on hardware. Layers and layers on software. Bridging even a couple of those layers is monumental. Bridging them all is borderline psychotic. It’s just a monumentally huge problem that requires so many people and some fantastically expensive and sophisticated pieces of equipment. That said, my work and the companies holding it were scattered and it was feeling like it all needed to be more coherently brought together. We had “KnowmTech LLC”, the IP holding company Hillary and I formed with Kermit and Luis. I had MANC, a sole proprietorship Hillary helped me form for my consulting work with DARPA and was later used for the SBIR/STTR programs and which held the SDC memristor licenses. While in Minnesota at an extended family gathering I discussed what I was doing with Sam Barakat, a relative on my mother’s side. Sam lived in the Bay Area and had built and sold a successful consulting company and had a lot of experience navigating—and suing—large institutions. He expressed great interest in investing and Knowm Inc was formed not long after.

The IP and experience from KnowmTech and MANC was consolidated into Knowm Inc. It was both exciting and very stressful — I was still in the middle of SBIR/STTR contract work and pulled in more directions than I could gracefully manage. It’s almost impossible to actually do technical work and also be a CEO. I recall when Bryant Wysocki at AFRL, later rising to the chief engineer, communicated that the upper brass wanted to “see headlines”. So that is what we did. Newly formed and with Sam’s investment, we hired a Bay Area marketing firm. It was my first real introduction to the life of a CEO and the absolute insanity of the Silicon Valley hustle. I’ll be honest, I absolutely fucking hated it.

Before that moment I was naive and thought the world worked a bit differently. I thought the articles I read about science and engineering discoveries were actual reporting — organic and earned. The marketing company opened a Rolodex and lined up interview after interview. They provide “advanced copy” and the promise of a pre-press-release scoop to the “journalists”. I gave countless briefs and calls and conferences. I learned that almost the whole thing is pay-to-play and that the reason I saw so many breathless announcements from name-brand universities was that it was all just a big hustle behind the curtain.

Knowm was immediately pitted against HP in the press, which I found a bit ridiculous. Money evaporated fast. But I could see how directly it moved public perception, and that was honestly depressing to me. The CIA, via its venture arm In-Q-Tel, called and wanted to talk about buying a board seat. The NSA called a bit later for what seemed like a ‘patriot check’—im not really sure. Apple invited us to Infinite Loop. Representatives from Samsung traveled to my home in Santa Fe. The problem shifted from technology and conceptually unlocking the mysteries of the universe to finding ways to pretend to compete with organizations that were vastly more resourced that Knowm to build chips that would accelerte algorithms that had yet to emerge. I was told again and again to “fake it until you make it”, but honestly…fuck that. There is no “faking” technology and the mere suggestion of that pisses me off. Thats how you end up on the cover of Forbes followed by a stent in prison. The problem of using greed and fear to attract attention and form alliances to unlock biased human perception is not the problem I want to solve. I was not attracted to the hustle or the attention. I just wanted to understand and unlock the mysteries of the universe and Nature. I still do.

And you know what? Knowm was a solution in search of a problem — and that’s the cruel irony of foresight. People don’t invest in problems they haven’t felt yet. Machine learning hadn’t yet demonstrated its potential, so the energy wall it would eventually run into didn’t exist. I was being asked to demonstrate machine learning benchmarks for a field that was still finding itself, with new models being proposed almost daily. Deep Learning was all the rage — yet more proof of the promise that lay ahead. But that frothy excitement was precisely the problem: everyone was still marveling at what was possible, not yet reckoning with what it would cost. It was nowhere near time for memristor accelerators. I could see where it was all headed, and I had no doubt that energy would be the defining constraint — but a problem that hasn’t arrived yet is indistinguishable from a problem that doesn’t exist.

A Problem In Search of a Solution#

Talk: Knowm and Thermodynamic RAM (Mentor Graphics, 2017).

On February 25, 2017 after forming Knowm Inc, I gave a talk about Knowm and Thermodynamic RAM to Mentor Graphics. The room and satellite offices watching the video stream were packed. I remember one slide in particular:

Slide titled The future is asking for synapses, lots of synapses, showing robot examples and a synapse production table.
The 2017 Thermodynamic RAM slide: the future is asking for synapses, lots of synapses.

The point of the slide was simple: transistor production, the most manufactured technological invention in history, had only barely reached the synapse production rate of honey bees. If robotics, autonomy, machine perception, and machine intelligence were going where I thought they were going, then the future would demand manufactured synapses in quantities that made transistor history look like a warm-up act. I made the prediction plainly: the synapse would rival the transistor as the most manufactured item in human history.

Four months later on June 12, 2017, the paper “Attention Is All You Need” was published. Transformers, and perhaps more generally the learned application of attention, went on to revolutionize machine learning and AI. The AI boom did not begin with hardware synapses—that was still a solution in search of a problem. The AI boom began with giant digital models, giant matrix operations, and giant weight files. The future is asking for synapses, but the present answered with the only thing it had: matrix multiplies made possible with gaming accelerators. But you do the best with what you can when you have it. Thats how nature works.

So why now? Why after all these years am I back at the keys? Because the only thing that will turn a solution in search of a problem into a problem in search of a solution—is time. The world had to create the transforms and hit the physical barriers before memristive synaptic substrates could assimilate them. We are still in the boot-loader phase of AI and you aint seen nothing yet.

The Cost of Moving Weight#

A modern neural network is full of learned, weight-bearing transforms: embedding tables, dense projections, attention query/key/value and output matrices, feed-forward weight matrices, convolutional kernels, state-space parameter matrices, etc. This is the machinery that turns one representation into another. The normal deployment model treats those transforms as weight tensors to be copied and moved, both for distribution to inference infrastructure but also withen inference memory heirarchies.

At the scale of biology (of which modern foundation models has not even remotly touched) moving weights is brutally expensive. You have to store them, transmit them, verify them, synchronize them, move them through memory hierarchies during every inference, and keep enough high-bandwidth infrastructure nearby to keep the whole thing fed. At frontier scale, this concentrates intelligence in data centers and makes them bleed for memory and energy. Its not a coincidence that we are facing an energy and memory crises while the AI race heats up. Its not surprising that chips are going wafer-scale. By accounting for the energy of moving weights, we can see clearly where this is all headed:

PmoveNbrσdV2P_{move} \approx N b r \sigma d V^2

Here, N is the number of values being moved, b is bits per value, r is the access rate, and the rest is the ugly physical cost of charging wires across distance at a voltage. We will fight every term in that equation. We will lower the voltage. Shorten the wires. Reduce precision. Make activity sparse. But the largest move is conceptual: stop moving the damn weights in the first place.

Copying a dense weight layer scales with the whole matrix: roughly mn values. Teaching behavior with examples scales with the number of examples and the width of the communicated neural state. A 4096 by 4096 layer at 16 bits is about 268 Mbits of weights. Fifty dense input-output examples are about 6.5 Mbits before sparse encoding. ASSIMILAATE pushes the communicated object smaller still: sparse Activation Address Tuple (AAT) pairs for assimilation, sparse AAT traffic during inference, and local conductance states (‘the weights’) emerging from the plastic synaptic substrate and are never communicated.

Let the light thing travel, not the weights.
Let the light thing travel, not the weights.

ASSIMILAATE is built around a blunt and simple rule: don’t move weights. Not for deployment, or updates or inference. Never. The teacher can be enormous and cloud-bound and integrate information from millions of sources. The edge targets are synchronized by recieving examples of what a transform does, the sparse compact activations themselves. The weights emerge, as they must, through intrinsic plasticity. The synapses are physical. They stay local. The transforms are real. They are communicated. The weights are effimeral. They emerge in place to satisfy the tranform under the constraints of the hardware.

The mistake is treating a massive object as though it were weightless. A 4096 by 4096 weight matrix at 16 bits is 268 Mbits. It has inertia. You feel it every time you haul it through memory for another inference pass. The answer is not to make the weights lighter—to quantize the bits, prune the matrix, or compress the tensor. The answer is to acknowledge it’s heavy and to stop lifting it. Let the light thing travel.

Why Memristor Hardware Seems Awkward#

When you really dive into the nuts and bolts of the memristor hardware literature it’s a bit painful and awkward. The classic (or ignorant) picture is beautiful on the surface: store weights as analog conductances in a crossbar, apply voltages to rows, read currents from columns, and get vector-matrix multiplication in one shot. On paper it is glorious—and it was one of my first patents over twenty years ago. It’s the sort of idea that comes from enthusiasim but little inexperience. When you get your head out of the clouds and into the lab, this beautiful picture develops ugly wrinkles and falls apart. You need precise conductance programming. You need DACs and ADCs. You have sneak paths and parasitics. You have device variation, drift, read disturbance, line resistance, temperature effects, and topology mismatch. It’s a mess, and it does not work.

Practical constraints in memristor crossbar hardware compared to idealized assumptions.
Practical constraints in memristor crossbar hardware compared to idealized assumptions.

The problem is not memristors. The problem is the computational model they’re being asked to serve. The memristor field has been trying to force a stochastic, plastic, physically alive device into the role of a static precision resistor — and then complaining that it misbehaves. Stochastic switching is a defect if your target is exact weight replication. Drift, variation, parasitics — same story. Whether the physics fights you or works for you depends entirely on what you are asking the hardware to do. We have been asking the wrong thing because we dont really understand what weights are.

What ASSIMILAATE Changes#

ASSIMILAATE stands for Analog Synaptic Systems Implementing Memristor-Integrated Learning and Activation Address Tuple Encoding. It with be the recurring theme for this particular blog series, and I will build it publically. Yes, it’s a dope backronym. No, I am not apologizing for it. Yes, I am going to turn it into a verb. The central move is to communicate neural state as AATs rather than dense analog values. An AAT is an ordered tuple of activation addresses. Each address is interpreted locally by a receiving neural lane as a synaptic selection. The same communicated AAT can be sent to many targets, and each target interprets it through its own local synaptic organization. AATs are the ultra-light-weight neural activation patterns that I believe are necessary and sufficient to enable AI at scale. If this is not making sense its OK. If you continue with me on this journey you will learn all about it in great detail as I build and simulate.

During assimilation, a teacher model produces input-output AAT pairs for its internal transforms. The teacher model can be a normal ML model (subject to some architectural constraint at least for the moment) and its internal activations are converted to AATs via vector quantizations methods or—also possible but less likely at first—the model can be trained from the start to utilise AATs. The target hardware presents the communicated AAT pairs to its neural lanes and the adaptive synaptic substrate “decodes” them into local transform states through local learning. The differential memristor synapses adapt in-situ until the local hardware emits the desired output AATs. No weights are transmitted. No attempt is made to force the target’s microscopic conductance state to match the teacher’s weightss. Two devices can implement the same transform with different learned conductance states, even in the presence of damage and degradation. The teacher publishes its desired behavior. The target instantiates behavior on its synaptic substrate by streaming ATT pairs.

At scale, a cloud teacher can publish AAT training streams for a transform—a new capability or an update. Edge devices subscribe to the streams they need, assimilate the behavior locally, validate against held-out examples, and report compact convergence metadata. The devices do not have to share identical microscopic conductance states. This is how billions of people will run frontier models in their pockets and vast fleets of robotic minds will synchronize and share knowledge continuously—without turning the planet into a sea of dystopian datacenters.

Ultra-Low-Energy Inference#

Once a transform has been assimilated into local memristive hardware, the learned state (the weights) never move. They are not streamed from DRAM for every inference, or held in SRAM locally. Information is streamed to synaptic cores as AATs, and the receiving neural lanes adapt and return the transformed ATTs. Memory access is synaptic access is compute is the transform. All becomes one when the distance between memory and processing becomes zero and we allow our hardware to adapt.

Memory access IS synaptic access IS compute IS the transform
Memory access IS synaptic access IS compute IS the transform

Dense digital systems are very good at arithmetic, but they pay dearly for data movement. ASSIMILAATE spends movement only on the neural state that must be communicated, while leaving the learned physical state in place. The future will need vast numbers of manufactured adaptive synapses, but their learned states will not be shipped around as data. Synapses become abundant, local, adaptive, fault-tolerant physical resources, and the transforms become the communicable object, implied by the activation patterns. The adaptive synaptic substrate makes it all possible. I am reminded of the song lyrics. “And it’s ironic too, because what we tend to do, is act on what they say, and then it is that way”. Catchy tune—and a bit prophetic.

Follow the build. Join the newsletter

Notice#

I love my life and all life. I am happy, fulfilled, mentally stable and firmly opposed to dramatic exits. I have jumped out of perfectly good airplanes, but only when the plan included a parachute, a landing zone, and somebody else checking the straps. I have no interest in high windows, firearms, or any other shortcut out of this strange gift of being alive. My intended future is long, healthy, and suspiciously ordinary: fishing, soldering circuits, writing, building things, lying in fields of wildflowers, and staring into the infinite universe wondering what is going on.