Post

Researchers Built a Chatbot That Only Knows the World Before 1931

Researchers Built a Chatbot That Only Knows the World Before 1931

Researchers Built a Chatbot That Only Knows the World Before 1931

Three researchers have created a unique chatbot that hasn’t read anything published after 1930. Talkie is a 13-billion-parameter language model trained on digital scans of English-language texts published before the end of 1930. This cutoff aligns with the current US public domain year, meaning anything published until the end of that year is fair game, and there are no lawsuits from irate IP-holders to worry about.

David Duvenaud, an associate professor of computer science and statistics at the University of Toronto, led the work with two collaborators. The model knows only what appears in books, newspapers, legal texts, and other publications before its cutoff date. So it’s great for questions about Prohibition or World War One.

Why Train Such a Model? 🤔

The obvious question arises: why train an AI that doesn’t know what the Nazis did, what the internet is, or what an LLM even is? These aren’t merely exercises to look at the “good old days” through rose-colored glasses; they serve as intellectual experiments. Duvenaud explained that such a model could be useful for examining how people might have interpreted laws or events at the time, using only the knowledge available then.

Another fun experiment: Use it to see whether a model can “rediscover” later breakthroughs using only earlier knowledge, probing the limits of AI reasoning.

Limitations of Talkie ⚠️

There are definite weaknesses in Talkie, which its inventors are well aware of. For example, there was no digital publishing in 1930, so every word of Talkie’s corpus had to be transcribed from a scan. OCR is famously imperfect, especially on the blurry text printed back in the day. It also leaks future information that can sometimes creep in from mislabeled future documents, despite the researchers’ best efforts.

Other Projects Mentioned 📚

The Talkie project isn’t alone. In their paper, the researchers mention other projects such as Ranke-4b from the University of Zurich, a series of LLMs with historical snapshots of data. Trip also created Mr. Chatterbox, which he trained on a dataset of British literature from 1500-1900 to become, in his words, “a Victorian gentleman in silicon.” These projects are both a fun experiment and a useful insight into the workings of AI.

As the Talkie researchers put it: “Have you ever daydreamed about talking to someone from the past? What would you ask someone with no knowledge of the modern world? What would they ask you?”

Read full article

This post is licensed under CC BY 4.0 by the author.