IBM Filed a Statistical Translation Patent in 1991. Here Is What Problem It Was Trying to Solve.
Note on this format: This memo records what I found at the patent URL and in publicly available sources. Full text and Claim 1 have not been read. Verified facts only; speculation is labeled as such.
Why Dig This
When people trace the history of translation AI today, they usually start with the Transformer (2017) or neural MT (around 2014). Thirty years earlier, IBM Research was already working on something that sounds surprisingly similar: learning to translate from data, not rules. But the design is fundamentally different from neural approaches. Reading this patent is useful precisely because it shows where the problem orientation was shared and where the actual engineering diverged.
Patent Basics
- Patent number: US5477451A
- Title: Method and system for natural language translation
- Filed: 1991 (exact date: not confirmed from full text)
- Inventors: Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Frederick Jelinek, Robert L. Mercer, and others
- Assignee: IBM Corporation
- Primary source: Google Patents (URL confirmed; full text unread)
- Legal status: Details not confirmed
Core Content (Wikipedia and Public Sources)
IBM Research (T.J. Watson Research Center) developed what became known as IBM Models 1–5: probabilistic translation models. The core idea: given a large parallel corpus (the same text translated between two languages), compute the probability that word A in language X corresponds to word B in language Y. Translation is then a search for the most probable target sentence given a source sentence.
This was commercialized as the Candide system, one of the first large-scale data-driven machine translation deployments. The system did not require linguists to write grammar rules — it learned from text.
Claim 1 wording and model structure details are unconfirmed — full text not read.
Connections to Today (Hypothesis)
| US5477451A (1991) | Modern translation technology | Assessment |
|---|---|---|
| Learn translation from parallel corpus | LLM pretraining from large text corpora | Analogy (both learn from data; mechanisms differ fundamentally) |
| Word-level probability correspondence | Transformer attention over subword tokens | Does not map well (designs are incompatible) |
| Rule-free, data-driven translation | Neural MT and LLM translation generally | Similar (shared problem orientation — no hand-written rules) |
Important clarification: Statistical MT (SMT) and neural MT (NMT) are not a continuous evolution. NMT largely replaced SMT around 2014–2016. Calling this a "predecessor of LLM translation" would be misleading. A more accurate framing: this is a record of the shift from rule-based to data-driven translation design — and a different branch than the one that led to current LLMs.
These are pre-full-text hypotheses. Assessment will be revised after Claim 1 review.
What's Unconfirmed
- Claim 1 verbatim text
- Exact relationship to the Candide system (same patent or separate?)
- Connection to the ALPAC Report (1966) — ALPAC ended an era of MT funding; Candide represented a partial revival
- Forward citation count
Next Action
Read Abstract and Claim 1 to confirm the model structure. Cross-reference with Brown et al. (1990, Computational Linguistics) to map the relationship between the academic paper and the patent. Potential companion article connecting to the ALPAC episode.
Sources:
- Primary patent: US5477451A on Google Patents
- Related episode: Declassified Archaeology #1 — The ALPAC Report (1966)
- AI & ML Patent #1 (full note): Amazon item-to-item collaborative filtering US6266649B1 (1998)