memristors · ahah-plasticity · knowm-history · darpa-synapse
Chapter 2: Before I Knew of the Word Memristor
Twenty-five years assembling the synaptic solution.
By Alex Nugent ·
In the previous post I laid out what ASSIMILAATE is and why I think transforms—not weights—are the unit of intelligence that gets shipped. This is part of the long walk that got me there.
Before I Knew of The Word Memristor#
In 2001 I was a junior undergraduate physics major with a new mission. A year earlier I had stumbled into neural networks after an random walk through my college library. I was frustrated with quantum mechanics, which I was supposed to be studying, so I got up and wandered around. On the third floor I found Jim Jubak’s In the Image of the Brain: Breaking the Barrier Between the Human Mind and Intelligent Machines. It was about neuromorphic computing, neural networks, and the idea that brains were not just digital computers running better software. They were a different kind of machine.
I became obsessed. I read everything I could find, learned about backpropagation of error, programmed small neural networks, and took any course with “Neuro” in the title. The conclusion I came to: the separation of memory and processing is going to be the key barrier to future AI. A brain did not shuttle bits back and forth between a memory bank and an arithmetic unit. A brain did it all in the same physical structure at the same time. That structure that seemingly allowed that to happen without burning gigawatts was the synapse. So I started to wonder how I could build one.
My first idea for building a physical neural network was to somehow shatter glass and fill the cracks with conductive material. I pictured one of those crystal balls an oracle might hold, fractured into networks. It was a beautiful but not-even-half-baked idea, and I had no idea whatsoever how to make it work. I quickly abandoned it for another idea that I could wrap my little undergraduate physics head around. I only mention it as a sort of foreshadowing of what’s to come—one of those strange premonitions that makes you think reality and time may not work exactly like we think.
My second idea for an artificial synapse idea was a resistance-changing connection made from nanoparticles in a colloidal suspension between electrodes on an integrated chip. Apply a voltage across a gap, induce dipoles in the particles, pull them into the gap, and form a bridge. The bridge would change the resistance, the resistance would be the synaptic weight, and the act of applying voltage would be the act of adaptation. I did not know the word memristor yet. I did not know Leon Chua had hypothesized a missing circuit element almost a decade before I was born. I was just an enthusiastic physics student with a growing suspicion that adaptive resistance was the physical primitive I wanted.
I kept asking my mother (Knowm Inc partner Hillary Riggs) why nobody was doing this. After many months of asking the same question, she said something to the effect of: “shut up and do it.” Let’s get a patent. Patent attorneys Kermit Lopez and Luis Ortiz joined shortly after as partners and the first Knowm patents were born, with many more to come. Those first patents are now expired — over 20 years (!!) — but we never stopped filing. The most recent Knowm application went in yesterday.
The Synapses Were Pretty Shitty#
The first technical problem was that a learned weight needs to go positive and negative, but a single resistance does not become negative. The answer was a differential pair. One resistive element contributes the positive side, one contributes the negative side, and the signed synaptic value is the difference between them. Differential pair memristor synapses are one of those things that becomes obvious after you see it. The fact that seemingly all of the natural world is formed from adaptive, competing differential energy dissipation pathways had yet to dawn on me—but I’ll be talking a lot more about that. That would come during my time advising the DARPA Physical Intelligence program after Todd Hylton and I successfully launched the DARPA SyNAPSE program. But I digress.
The second technical problem with synapses was more confounding. My particle synapses were volatile, noisy, unstable, and annoying in all the ways actual physical devices are annoying (in particular small ones). For a while I tried to ignore it, to treat it as a defect and ‘sweep it under the rug’, but then I realized biology had the same problem. Biological synapses are not precision components. They are sketchy little adaptive structures in a dynamic (living) system that is constantly repairing itself. Intelligence must be at a minimum compatible with volatility, if not require it. Maybe repair in the face of volatility is not a side effect of intelligence. Maybe that is intelligence or perhaps a necessary ingredient. These were the ideas being floated while I advised the DARPA SyNAPSE and Physical Intelligence programs and from which Todd and his growing team of SETAs tried—and failed—to launch the DARPA Thermodynamic Computing program.
My history in this line of thought goes back to my work at Los Alamos National Laboratory (LANL) with Anti-Hebbian and Hebbian (AHaH) plasticity. After graduating physics as an undergraduate I spent a year at LANL and found myself on a project related to future nano-scale computing systems. I worked with Reid Porter (Space Data System) and Garret Kenyon (Computational Neuroscience), and my research was heavily influenced by my seemingly shitty synapses. I trained neural network classifiers, damaged their weights and neurons, and then tested synaptic plasticity rules to see if any of them could repair the learned state while the network operated. Almost all of the rules failed. Some drove the weights to infinity. Some drove them to zero. Some oscillated. Some destroyed the classifier and marched performance straight to random choice. Then one actually worked, and we proceeded to work backwards from that.
That was the moment I stopped thinking of volatility as merely a hardware problem. Volatility was a clue and I had what appeared to be a tiny foot-hold on a solution. If the devices could adapt locally via plasticity rules operating on structured information, then the hardware did not need to be perfect. Not only that, but the exact state of the synapses at any given time did not really matter because the optimal set of weights was determined as much by the physical state of the network as the information it was processing. If the neural state recovered after the damage because the weights adapted to compensate—then the specific weights don’t matter. AHaH plasticity was acting to keep the neuron in its attractor—an attractor defined by the structure of the information and how the neuron’s transfer function related to that information—not by its weights. Hence the weights changed and the neuron’s transform was repaired. The information for self-assembly is in the datastream.
DARPA: SyNAPSE, Physical Intelligence and Thermodynamic Computing#
There is a much longer story here, and I have told part of the first decade of it here: Knowm History 2001-2011. The short version is that after dropping out of graduate school (PhD EE), moving back to Santa Fe New Mexico, and trying to keep the whole thing alive with a PC and stubbornness, Hillary and I eventually found our way to Washington, DC. We met with the Office of Naval Research. We met with the Small Business office of the National Science Foundation. I gave a talk at the patent office. The NSF meeting was, shall we say, “character-building”. I was told by the National Science Foundation that “we don’t fund science projects” and to get out of his office. I will not say his last name but it sounds somewhat like “Arch Dick”. That was my first hard lesson in the politics of innovation: While the idea is a necessary component it is very much up for grabs by the belt-way-bandits. The rest is access and status. People for the most part don’t listen to the message unless it comes from the right messenger. The information is not valued. Rather, it’s the container of the information that people value. Which is why our last meeting on that trip was ultimately the most important.
Our last stop was to meet Todd Hylton, who was then a director at the nanotechnology division of SAIC. Todd listened and had the physics background to understand what I was saying. He also seemed to have that same sense that ‘something else is out there’ and he was searching for it. He had the institutional position to walk into rooms that would not take me seriously. We could both say exactly the same words but whereas I would be ridiculed and denigrated he might command at least a moment of attention. I remember his words very clearly. “I am having a hard time right now determining if you are a genius or insane, but I’d like to keep working with you to better understand this”. And we did. That collaboration with Todd led to his hiring at DARPA as a program manager by then director Tony Tether for what became the Systems of Neuromorphic Adaptive Plastic Scalable Electronics program, better known as DARPA SyNAPSE. In 2008, with HP co-announcing that it had “discovered” the memristor, the SyNAPSE program kicked off with HP, IBM, and HRL as major performers. I spent the next four years on the government advisory team watching millions of dollars being spent with top US research companies attacking neuromorphic adaptive hardware.
I learned a lot. Some of it was technical, some of it was political, and some of it I can’t talk about. It was the kind of education you only get by being close enough to the machine to smell the oil. Some of it was amazing, and some of it stunk. But the core lesson never changed: if we want brain-scale computation in an energy and volume budget that makes sense, we need adaptive physical memory at the site of computation. We need synapses, and those synapses will likely be a little volatile.
Finding the Self Directed Channel Memristor#
DARPA has a rule that was central to its function. Program managers could only stay for about five years. I’m not sure if that is still how it works now. The idea is that DARPA’s role is to “prevent technological surprise”. It has to avoid the calcification that is rampant in other areas of the military industrial complex, where the same people with the same stale ideas become entrenched and control all the resources and direct them to all their same friends. So after those years Todd had to go, and with that all his “SETAs” (Science and Engineering Technical Advisors), which included me.
I met Robinson Pino, then with Air Force Research Labs, on a SyNAPSE program site visit. I forget the exact one—probably IBM in Palo Alto or HRL labs in Malibu. We talked for a few hours over drinks in a hotel bar. He encouraged me to apply for an SBIR grant and pointed me to the exact solicitation: “VLSI Building Blocks for Future Autonomous Air Vehicles”. After flying back to Santa Fe I got to work. I had never directly applied for government funding and there was a lot of administrative work to do and a proposal to write. I had less than a week. But with Hillary’s help we did it and “M. Alexander Nugent Consulting” was awarded a contract. That set my post-DARPA path. I was going to manage my own small-scale research effort, perhaps finally being able to realize the goal I had embarked on so many years before. I immediately hired my long-time friend and Knowm Inc partner Tim Molter to assist with the effort.
As part of this SBIR program I formalized my thoughts on the volatile synapse with a mathematical model of a memristor. It was an extension of my US patent 7,599,895 called “Methodology for the configuration and repair of unreliable switching elements” (US7599895B2). I needed a mathematical model of a memristor and so I built it from collections of meta-stable switches. I called it the generalized meta stable switch MSS memristor model: The Generalized Metastable Switch Memristor Model.
It was a simple but effective memristor model that was powered at its core by stochastic switching. By tweaking transition probabilities and the number of meta stable switches, one could model a wide range of devices. The SBIR effort was purely theoretical and simulation based, but of course it had to prove that the ideas would map to ultra low power hardware and thus everything had to be built up from physical models of devices. Tim and I first published this work as part of our AHaH Computing paper in 2014, but the ideas that underpinned it had been percolating for a decade.
It was at this time that I was made aware of a memristor device from Kris Campbell at Boise State University. I tweaked the parameters of the MSS model to fit an IV curve on a published paper and it seemed to fit nicely. I heard that some folks from AFRL were going to visit Dr. Campbell in Boise and I endeavored to tag along. I wanted to meet the guy. I only discovered once I arrived that he was actually a she. My preconceptions were undoubtedly influenced by the fact that over the whole SyNAPSE and PI programs I think I saw two women performers—and one of those transitioned from a man later. It’s an absurdly male dominated field—for the worse I might add, and Kris immediately stood out to me. Not just because she was a woman but because she had seemingly already accomplished something that so many other groups were flailing to achieve and yet this was the first I had heard of it. Remember my less-than-half-baked idea about building a neural network from “fractured glass”? The devices Kris was fabricating were ion-conducting devices built around a Ge-Se glass layer through which silver ions migrate under applied bias to form and dissolve a conductive channel. The fractured glass solid electrolyte layer engineered into the stack provides preferential pathways that confine where the Ag channel forms and reforms across cycles. This confinement is what gives the SDC device its edge compared to the filament formation that limits conventional CBRAM cells. The result is a low-current, low-voltage, high endurance, high yield, high temperature memristor well-suited to the commercialization of analog neuromorphic learning architectures. Deposited in a single sputter pass, no thermal anneal, CMOS back-end-of-line compatible. The kind of process you can actually move into a fab.
Tim and I made rapid progress on the SBIR program, which progressed through phase 1, phase 2 and a phase 2 extension over the years. We also received an additional STTR program with AFRL, and a phase-1 SBIR with the Office of Naval Research. It was an extremely busy and productive time. As part of our work that led to Thermodynamic Ram, Tim and I had completed our goals ahead of schedule and under budget. We had money to spare. I contacted Kris Campbell to see if she could create a memristor specifically suited to AHaH Computing. She said ‘probably’, so that’s where we spent that money. Kris delivered on that “probably” and my consulting company MANC obtained the exclusive world-wide license. This would later be transferred to Knowm Inc. Over a decade later, Knowm SDC memristors are still the only commercially available memristors researchers can buy off the shelf. A decade of independent characterization across dozens of countries has accumulated around them — the largest such body of work on any memristor. That time may be coming to an end, but I’ll get to that.
Knowm Inc#
I did not intend to form Knowm Inc from the start. The idea that I could build a singular company that would bring this technology to the world seemed a bit absurd to me. The existing computing technology stack is MASSIVE in its complexity. Layers and layers of hardware. Layers and layers of software. Bridging even a couple of those layers is monumental. Bridging them all is borderline psychotic. It’s just a monumentally huge problem that requires so many people and some fantastically expensive and sophisticated pieces of equipment. That said, my work and the companies holding it were scattered and it was feeling like it all needed to be more coherently brought together. We had “KnowmTech LLC”, the IP holding company Hillary and I formed with Kermit and Luis. I had MANC, a sole proprietorship Hillary helped me form for my consulting work with DARPA and was later used for the SBIR/STTR programs and which held the SDC memristor licenses. While in Minnesota at an extended family gathering I discussed what I was doing with Sam Barakat, a relative on my mother’s side. Sam lived in the Bay Area and had built and sold a successful consulting company and had a lot of experience navigating—and suing—large institutions. He expressed great interest in investing and Knowm Inc was formed not long after.
The IP and experience from KnowmTech and MANC was consolidated into Knowm Inc. It was both exciting and very stressful — I was still in the middle of SBIR/STTR contract work and pulled in more directions than I could gracefully manage. It’s almost impossible to actually do technical work and also be a CEO. I recall when Bryant Wysocki at AFRL, later rising to the chief engineer, communicated that the upper brass wanted to “see headlines”. So that is what we did. Newly formed and with Sam’s investment, we hired a Bay Area marketing firm. It was my first real introduction to the Silicon Valley hustle. I’ll be honest, I absolutely fucking hated it.
Before that moment I was naive and thought the world worked a bit differently. I thought the articles I read about science and engineering discoveries were actual reporting — organic and earned. The marketing company opened a Rolodex and lined up interview after interview. They provide “advanced copy” and the promise of a pre-press-release scoop to the “journalists”. I gave countless briefs and calls and conferences. I learned that almost the whole thing is pay-to-play and that the reason I saw so many breathless announcements from name-brand universities was that it was all just a big hustle behind the curtain.
Knowm was immediately pitted against HP in the press, which I found a bit ridiculous. Money evaporated fast. But I could see how directly it moved public perception, and that was honestly depressing to me. The CIA, via its venture arm In-Q-Tel, called and wanted to talk about buying a board seat. The NSA called a bit later for what seemed like a ‘patriot check’—im not really sure. Apple invited us to Infinite Loop. Representatives from Samsung traveled to my home in Santa Fe. The problem shifted from technology and conceptually unlocking the mysteries of the universe to finding ways to pretend to compete with organizations that were vastly more resourced than Knowm to build chips that would accelerate algorithms that had yet to emerge. I was told again and again to “fake it until you make it”, but honestly…no thanks. There is no “faking” technology and the mere suggestion of that pisses me off. That’s how you end up on the cover of Forbes followed by a stint in prison. The problem of using greed and fear to attract attention and form alliances to unlock biased human perception is not the problem I want to solve. I was not attracted to the hustle or the attention. I just wanted to understand and unlock the mysteries of the universe and Nature. I still do.
And you know what? Knowm was a solution in search of a problem — and that’s the cruel irony of foresight. People don’t invest in problems they haven’t felt yet. Machine learning hadn’t yet demonstrated its potential, so the energy wall it would eventually run into didn’t exist. I was being asked to demonstrate machine learning benchmarks for a field that was still finding itself, with new models being proposed almost daily. Deep Learning was all the rage — yet more proof of the promise that lay ahead. But that frothy excitement was precisely the problem: everyone was still marveling at what was possible, not yet reckoning with what it would cost. It was nowhere near time for memristor accelerators. I could see where it was all headed, and I had no doubt that energy would be the defining constraint — but a problem that hasn’t arrived yet is indistinguishable from a problem that doesn’t exist.
A Problem In Search of a Solution#
On February 25, 2017 after forming Knowm Inc, I gave a talk about Knowm and Thermodynamic RAM to Mentor Graphics. The room and satellite offices watching the video stream were packed. I remember one slide in particular:
The point of the slide was simple: transistor production, the most manufactured technological invention in history, had only barely reached the synapse production rate of honey bees. If robotics, autonomy, machine perception, and machine intelligence were going where I thought they were going, then the future would demand manufactured synapses in quantities that made transistor history look like a warm-up act. I made the prediction plainly: the synapse would rival the transistor as the most manufactured item in human history.
Four months later on June 12, 2017, the paper “Attention Is All You Need” was published. Transformers, and perhaps more generally the learned application of attention, went on to revolutionize machine learning and AI. The AI boom did not begin with hardware synapses—that was still a solution in search of a problem. The AI boom began with giant digital models, giant matrix operations, and giant weight files. The future is asking for synapses, but the present answered with the only thing it had: matrix multiplies made possible with gaming accelerators. But you do the best with what you can when you have it. That’s how nature works.
So why now? Why after all these years am I back at the keys? Because the only thing that will turn a solution in search of a problem into a problem in search of a solution—is time. The world had to create the transforms and hit the physical barriers before memristive synaptic substrates could assimilate them. We are still in the boot-loader phase of AI and you aint seen nothing yet.