HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech
Dec 03, 2020
Episode description
In this episode, Chris is harassed by quite a few artificial nuisance callers, among drug lords, Irish nurses and some random Linux Inlaws Chief Financial Officer. Based on these examples, our two heroes discuss the history and current state of text-to- speech (TTS) and voice recognition. We attempted to use voice recognition software in order to produce a transcript of the show.
Shownotes:
- Wavenet: https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
- Tacotron: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
- DeepSpeech: https://github.com/mozilla/DeepSpeech
- Lyrebird / Welcome.AI: https://www.welcome.ai/lyrebird
- Nvidia Tacotron 2: https://github.com/NVIDIA/tacotron2
- Tensorflow: https://www.tensorflow.org
- PyTorch: https://pytorch.org
- Melspectrograms: https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53
- GRAPHCORE: https://www.graphcore.ai
- FGPA: https://en.wikipedia.org/wiki/Field-programmable_gate_array
- IBM ROMP: https://en.wikipedia.org/wiki/IBM_ROMP
- Google's TTS: https://cloud.google.com/text-to-speech
- Apple M1: https://www.gsmarena.com/the_apple_m1_is_the_first_armbased_chipset_for_macs_with_the_fastest_cpu_cores_and_top_igpu-news-46222.php
- Secure Enclaves: https://support.apple.com/guide/security/secure-enclave-overview-sec59b0b31ff/web
- OSDU: https://www.opengroup.org/osdu/forum-homepage
- Jack Kerouac's On the Road: https://en.wikipedia.org/wiki/On_the_Road
For the best experience, listen in Metacast app for iOS or Android
