626: Subword Tokenization with Byte-Pair Encoding
Nov 11, 2022•7 min
Episode description
Word tokenization, character tokenization and subword tokenization go head-to-head this week as Jon Krohn delivers a mini-bootcamp on the NLP-related process.
Additional materials: www.superdatascience.com/626
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast