The Case of the Blah Blah Blahs
Dec 08, 2020•37 min•Season 3Ep. 8
Episode description
A famous datset of Reuters articles from the 1980s includes “Blah blah blah.” in place of some stories. Why?
We have a Patreon now! Sign up to support the show and get access to our bonus podcast, Overunderstood.
Show notes:
- 00:31 - The link Jess sent
- 8:31 - SGML
- 8:46 - This is what the blahs look like and this is what all the entries look like.
- 24:00 - FTP
- 24:34 - Linguistic Data Consortium
- 29:00 - RCV1 at NIST and David D. Lewis’s README
- 30:22 - Construe-TIS: A System for Content-Based Indexing of a Database of News Stories (Phil Hayes and Steven Weinstein)