78. Where do corpora come from?, with Matt Honnibal and Ines Montani - podcast episode cover

78. Where do corpora come from?, with Matt Honnibal and Ines Montani

Jan 15, 201930 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Most NLP projects rely crucially on the quality of annotations used for training and evaluating models. In this episode, Matt and Ines of Explosion AI tell us how Prodigy can improve data annotation and model development workflows. Prodigy is an annotation tool implemented as a python library, and it comes with a web application and a command line interface. A developer can define input data streams and design simple annotation interfaces. Prodigy can help break down complex annotation decisions into a series of binary decisions, and it provides easy integration with spaCy models. Developers can specify how models should be modified as new annotations come in in an active learning framework. Prodigy: https://prodi.gy Prodigy recipe scripts: https://github.com/explosion/prodigy-recipes Twitter: https://twitter.com/_inesmontani https://twitter.com/honnibal
For the best experience, listen in Metacast app for iOS or Android