THEME

Introduction & Methodology

What AI Archaeology is, why we do it, how to write it, pitfalls, and synthesis.

4 episodes

AI ARCHAEOLOGY SYNTHESIS #1
Phase 1 100 Episodes Complete ── Mapping the Four Structural Excavation Axes That Emerged in 8 Days, 29 Sessions: DB Reliability 6 Forms, Eligibility Wall 4 Forms, Cage Patents 9 Forms, 12 Sub-series
AI Archaeology Synthesis #1 ── A retrospective note covering the run from series launch on 2026-05-01 (ep01-07 completed same day) through the 100-episode plan starting 2026-05-06 (Day 1) to Day 29 = ep100 on 2026-05-08. Across 8 days end-to-end, the 29-session 100-episode plan portion was packed into just 3 days. The series can be re-read not as a sequence of individual episodes but as four structural excavation axes: a typology of mismatches between primary patent records and conventional accounts; a judicial-history account of why software inventions were not patented; a design-philosophy genealogy of inventions that 'confine in order to use'; and the 12-sub-series map of what the series ended up covering.
From the AI Archaeology series launch on 2026-05-01 (ep01-07 completed the same day) through Day 29 = ep100 on 2026-05-08, the series reached 100 episodes (31 excavation notes + 69 excavation memos) over 8 days end-to-end. The 100-episode plan portion (Day 1 starting 2026-05-06 through Day 29) was a 3-day, 29-session high-density operation; 'Day' here is a session number, not a calendar day (5/7 alone hosted Day 3 through Day 13, and 5/8 hosted Day 14 through Day 29). This synthesis note does not introduce individual episodes; it stocktakes what the 100 episodes excavated, organized along four axes. Axis 1 is 'DB reliability 6 forms', summarizing 57 cumulative DB corrections and 13 confirmations from Day 8 onward into six recurrent forms: (1) wrong-number swaps where the listed patent points to a completely different invention (cosmetics 5+ cases, food-health 1 case), (2) marketing-phrase misreadings where the patent number is real but the conventional 'Claim 1 subject' diverges from the verbatim claim (CS-009 P&G niacinamide, conventionally 'whitening' vs verbatim 'regulating mammalian skin pore size'), (3) information-wall via interactive search UI (CS-004 DPMA, CS-005 USPTO 1970s, CS-010 J-PlatPat as a three-form set), (4) absence (PH-004 Köhler-Milstein, CS-002 botulinum toxin, multiple SW absences), (5) eligibility wall (7 SW cases), and (6) information-wall via OCR failure (SW-007 Lapson, second inventor field garbled). Axis 2 is 'Eligibility Wall 4 forms', established Days 24-26 as a structural classification of why 1957-1990s US software inventions were not patented: (a) pre-judicial era with 4 sub-forms (FORTRAN/LISP/ALGOL/COBOL), (b) unsettled era (HyperCard), (c) government-contract forced disclosure (BBN IMP, Bell-LaPadula), (d) corporate-strategy voluntary disclosure (Smalltalk). Axis 3 is 'Cage Patents 9 forms', accumulated Days 19-28: a 'confine in order to use' design genealogy that spans 6 physical cage forms (electron / charge / static molecular / electrical / dynamic molecular / container / ion) and 3 logical cage forms (type / policy / capability). Axis 4 is the '12 sub-series excavation map' that grew from the original 4 sub-series (Patent / IR / Standard / Declassified) by adding Kitchen Health / Pharma / Cosmetic / Hardware-Energy / Internet-Crypto / Software-UI / AI-ML / Food-Health.
TEMPLATES
All Prompts and the Full Pipeline — A Complete Kit for Starting in Your Own Field
Templates — the entire set of weapons used across posts 1–5, in one reproducible file
The final post of the series. Every prompt I actually used in posts 1-5 (candidate selection, content extraction, modern translation, grading, pitfall checks), the complete pipeline diagram, the tool stack, and the reproduction steps — all collapsed into one article. The goal: by the end, you can start mining forgotten long-form documents in your own field today.
PITFALLS
The Three Big Traps of LLM-Mediated Archaeology — Fabrication, Cost Explosion, Misreading
Pitfalls — the failures I actually hit across posts 1–5, and the prompts that fix them
When you use an LLM to mine forgotten long-form documents, you will almost certainly hit three traps: fabrication (fake citations), cost explosion (503s, model deprecations, token blowups), misreading (language barriers, term-of-art confusion). This post discloses every real incident from posts 1–5 of this series, with the prompts and operating rules that prevent each one.
EPISODE 01
AI Archaeology: Mining Forgotten Long-Form Documents with LLMs
What a 3M-view tweet about expired patents × Claude showed me — a whole new content genre nobody is doing yet
A 3M-view tweet by @gippp69 revealed something bigger than Amazon arbitrage: a meta-method for using LLMs to mine documents nobody reads anymore. Patents, IR archives, decommissioned standards, declassified reports — Claude can read them in a single night. Humans cannot. This article opens a 7-part series on doing exactly that.