AI Archaeology
Mining Forgotten Documents
ESSAY ・ AI ARCHAEOLOGY

I checked 100 “origin patent” claims for AI archaeology. Here's how often the popular story was wrong

Haruko ・ 2026-05-10 ・ AI Archaeology

For eight days in May 2026, I read patent Claim 1s the way other people read a daily newspaper. One every couple of hours. A hundred of them, end to end.

The plan was small: I wanted to know whether old “origin patents” — the ones tech writers cite when they explain how flash memory or DRAM or the mouse or avobenzone or LiCoO2 “started” — actually said what people quote them as saying. I'm not a patent attorney. I read them as historical artifacts. What does the front page list as the inventor? What does the assignee field say? What does Claim 1 actually claim? Where does the popular tech-history version line up, and where does it drift?

By the time I'd gone through 100 episodes, I had recorded 55 corrections to the database I started with — a database I'd assembled from secondary sources before doing the work. Not minor stylistic edits. Wrong inventors. Wrong dates by multiple years. Wrong assignee chains. Claim 1 language that didn't match what the popular summary said.

And there was one design pattern that kept showing up, across decades, fields, and continents: the cage. Lock something inside, then make it useful. Across 100 patents I counted nine different forms of it.

This is what eight days of reading told me.

The method, briefly

The unit of work was an “episode”: pick a candidate patent, pull the Claim 1 text from a primary source where possible (Google Patents PDF for old US grants, USPTO Patent Public Search where it cooperated, EPO Espacenet for European filings), compare against the popular tech-history narrative, and write up the gap. Total run: 100 episodes across four sub-series — Patent Archaeology, Hardware/Energy Patents, Software UI Patents, and a Cosmetic/Pharma branch.

Where I couldn't reach the primary source — e.g., 1900s German Reichspatents that DPMA hasn't fully digitized; ancient Japanese filings on J-PlatPat that require interactive search — I logged the wall instead of inventing around it. Failure to retrieve turned out to be a finding in its own right, and I'll come back to it.

Five patterns of how the popular story was wrong

Sorting the 55 corrections, they collapse into five recurring shapes. Counts are approximate because some corrections cross categories.

PATTERN 1~12 cases
Inventor attribution wrong

Engelbart's mouse, US3541541 (1967): popular story says 'Engelbart and Bill English co-invented it.' The patent itself lists Douglas C. Engelbart as sole inventor. English was the SRI implementation engineer who later ran the Mother of All Demos. Joint credit got back-projected onto the legal filing.

PATTERN 2~14 cases
Dates off by years

Masuoka's flash memory cell, US4531203A: many secondary sources say 'patented 1982.' Filed 1981-11-13, granted 1985-07-23, with US priority back to a Japanese filing on 1980-12-20. Three years' difference matters when you're tracing the relationship between IEDM 1984 (the 'flash' name origin) and the underlying structure patent.

PATTERN 3~9 cases
Assignee chain collapsed

Avobenzone (1973), the UV filter still in your sunscreen: popular tracking stops at 'Givaudan.' The actual chain is Roure Bertrand Dupont SA → 1991 Givaudan-Roure → 2000 DSM Nutritional Products → 2023 dsm-firmenich. Patent rights, royalties, and the institutional memory of who actually filed move with these mergers; the popular story papers over it.

PATTERN 4~10 cases
Claim 1 paraphrased to the point of distortion

Viterbi's algorithm patent: explanations online describe it as 'a method for decoding convolutional codes.' Claim 1 is structurally about the maximum-likelihood path-selection apparatus with specific buffer and survivor-state mechanics. The pop summary loses the structural element that made it patentable in 1967.

PATTERN 5~10 cases
Information walls — patent number missing or unverifiable

Lifschütz's 1902 lanolin emulsion (the foundation of Eucerit and Beiersdorf's whole century): I couldn't find a verifiable DRP (Deutsches Reichspatent) number in 13 public-facing sources. DPMA DEPATISnet doesn't fully digitize 1900s DRPs — it requires interactive UI navigation. AI-suggested candidate numbers (DRP 132307, 154959, 171146) didn't return verifiable hits. The popular story exists; the underlying claim text is sealed behind a database wall.

Why this drift happens

None of these are conspiracies. They're what happens when patent text gets passed through several rewrites — a press release at filing, a Wikipedia paraphrase ten years later, a textbook quoting the Wikipedia version, a tech blog quoting the textbook — before reaching the reader who's “explaining” the origin. Each step smooths the legal language and adds the social context the writer cares about. By the fourth or fifth hop, what's left is a story that fits the field's narrative arc, with the patent's actual structural claims sanded off.

The frustrating case is Pattern 5 — information walls. Even if you want to read the original, you can't always get there. The 1900s DRPs are a good example: every German cosmetics or pharma history that mentions Eucerit or Lifschütz cites a popular story that's probably correct in spirit, but the underlying patent number — the thing that would let you verify Claim 1 — is sealed behind an interactive search UI on DPMA DEPATISnet, which automated tooling can't cleanly traverse. Three centuries of patent law collide with twenty years of OCR limitations and you get a database that knows the answer but can't hand it to you in machine-readable form.

The pattern that kept showing up: the cage

The thing I didn't expect was how often the same design philosophy appeared in patents that, on the surface, are doing completely different things. Storing data, burning fat, lighting a room, releasing a fragrance, running a programming language. The shared move: confine something — electrons, charges, photons, molecules, ions, heat, or even software know-how — and make the confinement itself the useful structure.

I'm calling these cage patents. Across the 100 episodes I counted nine forms:

FORM #1
Electron cage
Masuoka 1980, US4531203A — flash floating gate
FORM #2
Charge cage
Dennard 1968, US3387286 — 1T1C DRAM
FORM #3
Photon cage
Bell Labs semiconductor laser patent family
FORM #4
Molecular cage
Cyclodextrin and clathrate sustained-release patents
FORM #5
Ion cage
Goodenough 1980s, LiCoO₂ intercalation patents
FORM #6
Thermal cage
MEMS package thermal-isolation patents
FORM #7
Logic cage 1: pre-judicial era
Backus 1957, FORTRAN — never patented
FORM #8
Logic cage 2: unsettled era
Atkinson 1985, HyperCard — never patented
FORM #9
Logic cage 3: forced/voluntary openness
BBN IMP 1969 / Xerox PARC Smalltalk 1972

Forms 1–6 are physical: literal walls of oxide, dielectric, layered crystal, host molecule, or insulating substrate, with something trapped inside. Forms 7–9 are something stranger — they're cages made of doctrine. Pure software inventions before Gottschalk v. Benson (1972) couldn't be patented at all; IBM caged FORTRAN's know-how through manuals released first and code distributed for free. Atkinson's HyperCard (1985–87) hit the unsettled era between Diamond v. Diehr and State Street Bank; Apple caged it through a bundled-distribution contract instead of a patent. ARPA's contract terms forced the BBN IMP design (1969) into the public domain via Report 1822 → DDC → RFC; Xerox PARC voluntarily released Smalltalk-80 for unrestricted redistribution in 1981. By choosing not to patent, both organizations caged a design specification inside the industry's shared vocabulary, which turned out to be more durable than any 17-year exclusion right.

The reason this matters for “AI archaeology”: when you're trying to predict where the next bottleneck is, the cage patents are where the bottleneck has historically lived. Flash storage limits, DRAM scaling, photonic compute density, drug bioavailability, EV cathode chemistry — every one of these has a Claim 1 somewhere that defines what gets confined and how. That's where the engineering slack is. That's where the next 5x lives, or doesn't.

What this is, and what this isn't

This is technical history and a market hypothesis. It is not legal analysis. I'm a non-lawyer reading public patent documents because I'm curious about how the original Claim 1 language compares to what people quote. Claim scope, infringement, prosecution history, doctrine of equivalents — those are practitioner questions and I don't pretend to answer them. If you're a patent attorney and you spot a claim I've misread, please tell me; I'll correct it the same way I corrected the 55 entries already.

The five patterns above aren't evidence that secondary tech writing is systematically dishonest. They're evidence that any primary source, quoted across enough hops, drifts. The cage finding isn't a theory of everything — it's one design pattern that recurred with surprising frequency in a sample of 100 patents I happened to read in May 2026.

I'd like to know whether patent practitioners reading this see the drift-by-paraphrase pattern as something they encounter routinely, and whether the cage framing tracks with how they categorize structural claims. If you've seen better systematic ways to read old Claim 1s as historical artifacts, I'm interested.


The 100 episodes are at ai-archaeology.vercel.app/en. A long-form treatment of the nine cage forms — Volume 2 of the AI Archaeology book series, with Claim 1 verbatim for every form — is in preparation for June 2026. Notification signup: Cage Patents preview.