Data Science at Home - podcast cover

Data Science at Home

Francesco Gadaletadatascienceathome.podbean.com

Cutting through AI bullsh*t.
Come join the discussion on Discord!
https://discord.gg/4UNKGf3

Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Don't be naive with data anonymization (Ep. 98)

Masking, obfuscating, stripping, shuffling. All the above techniques try to do one simple thing: keeping the data private while sharing it with third parties. Unfortunately, they are not the silver bullet to confidentiality. All the players in the synthetic data space rely on simplistic techniques that are not secure, might not be compliant and risky for production. At pryml we do things differently.

Mar 08, 202014 minEp. 95

Why sharing real data is dangerous (Ep. 97)

There are very good reasons why a financial institution should never share their data. Actually, they should never even move their data. Ever. In this episode I explain you why.

Mar 01, 202011 minEp. 94

Building reproducible machine learning in production (Ep. 96)

Building reproducible models is essential for all those scenarios in which the lead developer is collaborating with other team members. Reproducibility in machine learning shall not be an art, rather it should be achieved via a methodical approach. In this episode I give a few suggestions about how to make your ML models reproducible and keep your workflow as smooth. Enjoy the show! Come visit us on our discord channel and have a chat...

Feb 22, 202014 minEp. 93

Bridging the gap between data science and data engineering: metrics (Ep. 95)

Data science and data engineering are usually two different departments in organisations. Bridging the gap between the two is essential to success. Many times the brilliant applications created by data scientists don't find a match in production, just because they are not production-ready. In this episode I have a talk with Daan Gerits, co-founder and CTO at Pryml.io

Feb 14, 202013 minEp. 92

A big welcome to Pryml: faster machine learning applications to production (Ep. 94)

Why so much silence? Building a company! That's why :) I am building pryml , a platform that allows data scientists build their applications on data they cannot get access to. This is the first of a series of episodes in which I will speak about the technology and the challenges we are facing while we build it. Happy listening and stay tuned!

Feb 07, 20209 minEp. 91

It's cold outside. Let's speak about AI winter (Ep. 93)

In the last episode of 2019 I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future. Join us to our Discord ...

Dec 31, 201937 minEp. 90

The dark side of AI: bias in the machine (Ep. 92)

This is the fourth and last episode of mini series "The dark side of AI" . I am your host Francesco and I’m with Chiara Tonini from London. The title of today’s episode is B ias in the machine C: Francesco, today we are starting with an infuriating discussion. Are you ready to be angry? F: yeah sure is this about brexit? No, I don’t talk about that. In 1986 the New York City’s Rockefeller University conducted a study on breast and uterine cancers and their link to obesity. Like in all clinical t...

Dec 28, 201920 minEp. 89

The dark side of AI: metadata and the death of privacy (Ep. 91)

Get in touch with us Join the discussion about data science, machine learning and artificial intelligence on our Discord server Episode transcript We always hear the word “metadata”, usually in a sentence that goes like this Your Honor, I swear, we were not collecting users data, just metadata. Usually the guy saying this sentence is Zuckerberg, but could be anybody from Amazon or Google. “Just” metadata, so no problem. This is one of the biggest lies about the reality of data collection. F: Ok ...

Dec 23, 201923 minEp. 88

The dark side of AI: recommend and manipulate (Ep. 90)

In 2017 a research group at the University of Washington did a study on the Black Lives Matter movement on Twitter. They constructed what they call a “shared audience graph” to analyse the different groups of audiences participating in the debate, and found an alignment of the groups with the political left and political right, as well as clear alignments with groups participating in other debates, like environmental issues, abortion issues and so on. In simple terms, someone who is pro-environm...

Dec 11, 201921 minEp. 87

The dark side of AI: social media and the optimization of addiction (Ep. 89)

Chamath Palihapitiya, former Vice President of User Growth at Facebook, was giving a talk at Stanford University, when he said this: “I feel tremendous guilt. The short-term, dopamine-driven feedback loops that we have created are destroying how society works ”. He was referring to how social media platforms leverage our neurological build-up in the same way slot machines and cocaine do, to keep us using their products as much as possible. They turn us into addicts. F: how many times do you chec...

Dec 03, 201923 minEp. 86

More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention. In this episode I explain what these mechanisms are, how they work and why they are so powerful. Don't forget to subscribe to our Newsletter or join the discussion on our Discord server References Attention is all you need https://arxiv.org/abs/1706.03762 The ill...

Nov 27, 201938 minEp. 85

How to improve the stability of training a GAN (Ep. 88)

Generative Adversarial Networks or GANs are very powerful tools to generate data. However, training a GAN is not easy. More specifically, GANs suffer of three major issues such as instability of the training procedure, mode collapse and vanishing gradients. In this episode I not only explain the most challenging issues one would encounter while designing and training Generative Adversarial Networks. But also some methods and architectures to mitigate them. In addition I elucidate the three speci...

Nov 18, 201928 minEp. 84

What if I train a neural network with random data? (with Stanisław Jastrzębski) (Ep. 87)

What happens to a neural network trained with random data? Are massive neural networks just lookup tables or do they truly learn something? Today’s episode will be about memorisation and generalisation in deep learning, with Stanislaw Jastrzębski from New York University. Stan spent two summers as a visiting student with Prof. Yoshua Bengio and has been working on Understanding and improving how deep network generalise Representation Learning Natural Language Processing Computer Aided Drug Desig...

Nov 12, 201920 minEp. 83

Deeplearning is easier when it is illustrated (with Jon Krohn) (Ep. 86)

In this episode I speak with Jon Krohn, author of Deeplearning Illustrated a book that makes deep learning easier to grasp. We also talk about some important guidelines to take into account whenever you implement a deep learning model, how to deal with bias in machine learning used to match jobs to candidates and the future of AI. You can purchase the book from informit.com/dsathome with code DSATHOME and get 40% off books/eBooks and 60% off video training...

Nov 05, 201945 minEp. 82

[RB] How to generate very large images with GANs (Ep. 85)

Join the discussion on our Discord server In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs. The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images. Enjoy the show! References Multi-scale GANs for Memory-efficient Gene...

Nov 04, 201915 minEp. 81

More powerful deep learning with transformers (Ep. 84)

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention. In this episode I explain what these mechanisms are, how they work and why they are so powerful. Don't forget to subscribe to our Newsletter or join the discussion on our Discord server References Attention is all you need https://arxiv.org/abs/1706.03762 The ill...

Oct 27, 201938 minEp. 80

[RB] Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 83)

Join the discussion on our Discord server In this episode, I am with Aaron Gokaslan , computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published . We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI. Moreover, Aaron provides some very interesting links and demos that will blow your mind! Enj...

Oct 18, 201938 minEp. 79

What is wrong with reinforcement learning? (Ep. 82)

Join the discussion on our Discord server After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here. In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it? Are you a listener of Data Science at Home podcast? A reade...

Oct 15, 201922 minEp. 78

Have you met Shannon? Conversation with Jimmy Soni and Rob Goodman about one of the greatest minds in history (Ep. 81)

Join the discussion on our Discord server In this episode I have an amazing conversation with Jimmy Soni and Rob Goodman, authors of “A mind at play” , a book entirely dedicated to the life and achievements of Claude Shannon. Claude Shannon does not need any introduction. But for those who need a refresh, Shannon is the inventor of the information age. Have you heard of binary code, entropy in information theory, data compression theory (the stuff behind mp3, mpg, zip, etc.), error correcting co...

Oct 10, 201932 minEp. 77

Attacking machine learning for fun and profit (with the authors of SecML Ep. 80)

Join the discussion on our Discord server As ML plays a more and more relevant role in many domains of everyday life, it’s quite obvious to see more and more attacks to ML systems. In this episode we talk about the most popular attacks against machine learning systems and some mitigations designed by researchers Ambra Demontis and Marco Melis, from the University of Cagliari (Italy). The guests are also the authors of SecML, an open-source Python library for the security evaluation of Machine Le...

Oct 01, 201934 minEp. 76

[RB] How to scale AI in your organisation (Ep. 79)

Join the discussion on our Discord server Scaling technology and business processes are not equal. Since the beginning of the enterprise technology, scaling software has been a difficult task to get right inside large organisations. When it comes to Artificial Intelligence and Machine Learning, it becomes vastly more complicated. In this episode I propose a framework - in five pillars - for the business side of artificial intelligence....

Sep 26, 201913 minEp. 66

Replicating GPT-2, the most dangerous NLP model (with Aaron Gokaslan) (Ep. 78)

Join the discussion on our Discord server In this episode, I am with Aaron Gokaslan , computer vision researcher, AI Resident at Facebook AI Research. Aaron is the author of OpenGPT-2, a parallel NLP model to the most discussed version that OpenAI decided not to release because too accurate to be published . We discuss about image-to-image translation, the dangers of the GPT-2 model and the future of AI. Moreover, Aaron provides some very interesting links and demos that will blow your mind! Enj...

Sep 23, 201938 minEp. 74

Training neural networks faster without GPU [RB] (Ep. 77)

Join the discussion on our Discord server Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense. Enjoy the show! References Faster Neural Network Training with Data Echoing https://arxiv.org/abs/1907.05550...

Sep 17, 201922 minEp. 73

How to generate very large images with GANs (Ep. 76)

Join the discussion on our Discord server In this episode I explain how a research group from the University of Lubeck dominated the curse of dimensionality for the generation of large medical images with GANs. The problem is not as trivial as it seems. Many researchers have failed in generating large images with GANs before. One interesting application of such approach is in medicine for the generation of CT and X-ray images. Enjoy the show! References Multi-scale GANs for Memory-efficient Gene...

Sep 06, 201915 minEp. 72

[RB] Complex video analysis made easy with Videoflow (Ep. 75)

In this episode I am with Jadiel de Armas, senior software engineer at Disney and author of Videflow, a Python framework that facilitates the quick development of complex video analysis applications and other series-processing based applications in a multiprocessing environment. I have inspected the videoflow repo on Github and some of the capabilities of this framework and I must say that it’s really interesting. Jadiel is going to tell us a lot more than what you can read from Github Reference...

Aug 29, 201931 minEp. 71

[RB] Validate neural networks without data with Dr. Charles Martin (Ep. 74)

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. The questions that Charles answers in the show are essentially two: Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML? How can we dominate DNN in a theoretically principled way? R...

Aug 27, 201945 minEp. 70

How to cluster tabular data with Markov Clustering (Ep. 73)

In this episode I explain how a community detection algorithm known as Markov clustering can be constructed by combining simple concepts like random walks, graphs, similarity matrix. Moreover, I highlight how one can build a similarity graph and then run a community detection algorithm on such graph to find clusters in tabular data . You can find a simple hands-on code snippet to play with on the Amethix Blog Enjoy the show! References [1] S. Fortunato, “Community detection in graphs”, Physics R...

Aug 20, 201921 minEp. 69

Waterfall or Agile? The best methodology for AI and machine learning (Ep. 72)

The two most widely considered software development models in modern project management are, without any doubt, the Waterfall Methodology and the Agile Methodology . In this episode I make a comparison between the two and explain what I believe is the best choice for your machine learning project. An interesting post to read (mentioned in the episode) is How businesses can scale Artificial Intelligence & Machine Learning https://amethix.com/how-businesses-can-scale-artificial-intelligence-ma...

Aug 14, 201914 minEp. 68

Training neural networks faster without GPU (Ep. 71)

Training neural networks faster usually involves the usage of powerful GPUs. In this episode I explain an interesting method from a group of researchers from Google Brain, who can train neural networks faster by squeezing the hardware to their needs and making the training pipeline more dense. Enjoy the show! References Faster Neural Network Training with Data Echoing https://arxiv.org/abs/1907.05550

Aug 06, 201922 minEp. 67

Validate neural networks without data with Dr. Charles Martin (Ep. 70)

In this episode, I am with Dr. Charles Martin from Calculation Consulting a machine learning and data science consulting company based in San Francisco. We speak about the nuts and bolts of deep neural networks and some impressive findings about the way they work. The questions that Charles answers in the show are essentially two: Why is regularisation in deep learning seemingly quite different than regularisation in other areas on ML? How can we dominate DNN in a theoretically principled way? R...

Jul 23, 201945 minEp. 65
For the best experience, listen in Metacast app for iOS or Android