Snacks Weekly on Data Science - podcast cover

Snacks Weekly on Data Science

This podcast is about making data science and machine learning knowledge accessible and less intimidating. Every week, I will handpick one selected industrial tech blog to break it down. We will discuss some key data science concepts and machine learning algorithms, and how they are applied in those real-world applications. Subscribe to the channel and enjoy Snacks Weekly on Data Science!
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Product bundle recommendation with Graph Learning and GPT [CVS Health]

In this episode, we will introduce what CVS Health is and the importance of product recommendations for their business needs. We will delve into how their data science team leveraged advanced technologies, including Graph Neural Networks and generative AI models like GPT-4, to develop a prototype system to make product bundle recommendations. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/cvs-health-tech-blog/ai-in-cvs-front-store...

Dec 09, 202411 min

Augmentation techniques for imbalanced text classification [Walmart]

In this episode, we will introduce the issue of data imbalance and its impact on machine learning models, especially for text data. We will discuss a range of augmentation techniques and walked through how Walmart’s data science team built an automated augmentation module to apply these techniques consistently and effectively. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/walmartglobaltech/augmentation-techniques-for-imbalanced-t...

Dec 02, 202410 min

Optimize delivery picking process with mathematical modeling [Instamart]

In this episode, we will discuss what is Instamart, its business model, and the need to optimize operational efficiency continuously. We will explore how the team tackled picker assignment issues, tested a multi-order batching solution, and applied modeling techniques to enhance Instamart’s speed and service quality. For more details, you can refer to their published tech blog, linked here for your reference: https://bytes.swiggy.com/optimizing-the-picking-process-to-enable-faster-deliveries-for...

Nov 25, 20248 min

Building Contextualised Moderation Classifier [GovTech Singapore]

In this episode, we introduce GovTech Singapore and its reasons for tackling the content moderation problem. We discuss their innovative approach to building the moderation classifier, which involves using a consensus voting mechanism with existing commercial LLMs to improve labeling in the training dataset, providing a strong foundation for developing the machine learning model. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/dsai...

Nov 18, 202410 min

Promotion aware demand forecasting for groceries [AFresh]

In this episode, we will introduce Afresh, explore the concept of demand forecasting, and discuss the critical role of promotions in grocery forecasting. We will also examine Afresh’s technical solution, which combines deep learning with a carefully curated set of data inputs to create an accurate, promotion-aware demand forecasting system. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/afresh-engineering/buy-one-get-one-free-prom...

Nov 11, 202411 min

Graph technology in fraud detection and prevention [Booking.com]

In this episode, we will explore the business model of Booking.com and the unique challenges it faces in preventing fraud. We will discuss how graph technology can enhance fraud detection by representing data through relationships and how combining this with machine learning enables Booking.com to detect and stop fraudulent activity in real-time. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/booking-com-development/leverage-graph...

Nov 04, 202410 min

Marketing mix modeling in marketing Measurement [Qonto]

In this episode, we will introduce a key marketing challenge—attribution—and explore how marketing mix modeling (MMM) can help solve it. We will also discuss how MMM works in practice and examined the trade-offs between using consultants for MMM development versus building an in-house solution. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/qonto-way/marketing-measurement-series-marketing-mix-modeling-at-qonto-337b8af11471...

Oct 28, 20249 min

Causal machine learning to power data driven decisions [Urban Company]

In this episode, we will explore Urban Company’s business needs and the role of causal machine learning in addressing them. We will delve into three key models—S-learner, T-learner, and X-learner—and used a simple example to illustrate how each one works. These models provide valuable insights by offering more accurate estimates of cause-and-effect relationships, helping companies make better data-driven decisions. For more details, you can refer to their published tech blog, linked here for you...

Oct 21, 202412 min

Advanced Product Categorization with Vision Language Models [Faire]

In this episode, we will explore how Faire tackled the challenge of product categorization. They initially used the K-nearest neighbor algorithm with CLIP embeddings, which improved categorization but still required manual corrections. To further enhance accuracy, the team fine-tuned a vision-language model using their in-house dataset, increasing accuracy significantly. This solution showcases how advanced machine learning can drive business efficiency. For more details, you can refer to their ...

Oct 14, 202411 min

Leverage CUPED to reduce experimentation lifecycle [Walmart]

In this episode, we will discuss why Walmart relies on online experimentation to drive data-driven decisions and how reducing variance is a crucial challenge to making these experiments more efficient. We will also introduce the CUPED methodology, explaining how Walmart leverages it to speed up its experimentation process, enabling faster, more reliable insights for continuous improvement. For more details, you can refer to their published tech blog, linked here for your reference: https://mediu...

Oct 07, 202412 min

Building video classifiers with vision language models and active learning [Netflix]

In this episode, we will explore the challenge Netflix faces in building machine learning models for video understanding. We will examine Netflix’s solution—a self-service system with active learning that empowers video experts to participate in creating and refining machine learning classifiers through a streamlined, three-step process. For more details, you can refer to their published tech blog, linked here for your reference: https://netflixtechblog.com/video-annotator-building-video-classif...

Sep 30, 202410 min

Measuring Marketing Incrementality with Geo Testing [Expedia]

In this episode, we explore the concept of incrementality in marketing, and how the data science team at Expedia leverages geo-testing to successfully measure their marketing campaign’s incrementality For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/expedia-group-tech/measuring-marketing-success-the-power-of-incrementality-and-geo-testing-1acd4291545d...

Sep 23, 202412 min

Personalized Out-of-App Marketing Strategy [Uber]

In this episode, we explore the concept of out-of-app marketing and its importance for businesses like Uber. We'll discuss the challenges involved in personalizing out-of-app marketing messages, the essential components of their recommendation architecture, and how the team created customized solutions to improve personalization and relevance in their recommendations. For more details, you can refer to their published tech blog, linked here for your reference: https://www.uber.com/blog/personali...

Sep 16, 202413 min

Predicting Estimated Time of Arrival (ETA) Reliability [Lyft]

In this episode, we will discuss the importance of ETA for ridesharing apps and the challenges of providing a reliable ETA to users upfront. We delved into the practices of the machine learning team at Lyft, examining how they developed a solution to address this unique challenge using a lightweight machine learning model. For more details, you can refer to their published tech blog, linked here for your reference: https://eng.lyft.com/eta-estimated-time-of-arrival-reliability-at-lyft-d4ca2720bd...

Sep 09, 202411 min

Moderating Inappropriate Video Content [Yelp]

In this episode, we will explore how Yelp navigates the challenges of incorporating video reviews into its platform. We will discuss the use of machine learning to detect inappropriate content and the strategies to maintain the quality and integrity of its platform. For more details, you can refer to their published tech blog, linked here for your reference: https://engineeringblog.yelp.com/2024/03/moderating-inappropriate-video-content-at-yelp.html...

Sep 02, 202410 min

Measuring brand perception with social media data and deep learning [Airbnb]

In this episode, we will introduce what a brand is and why measuring brand perception is crucial. We will discuss how social media data can be used to achieve this and look at how Airbnb leverages deep learning methods and word embedding technologies to quantify brand perception more effectively. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/airbnb-brandometer-powering-brand-perception-measurement-on-social-med...

Aug 26, 202412 min

Empower Decision Making with Regression Discontinuity Design [Instacart]

In this episode, we will explore the concept of quasi-experimentation and its role in Instacart's decision-making process. We will take a closer look at one specific methodology, regression discontinuity design, explaining its key concepts and demonstrating how it works through an interesting example. For more details, you can refer to their published tech blog, linked here for your reference: https://tech.instacart.com/optimizing-at-the-edge-using-regression-discontinuity-designs-to-power-decis...

Aug 19, 202414 min

Product Recommendation with Deep Learning and Reinforcement Learning [LinkedIn]

In this episode, we will discuss the machine learning architecture built by LinkedIn for their premium product recommendation. We will explore the machine learning architecture, which includes a two-towered neural network and reinforcement learning as key components. For more details, you can refer to their published tech blog, linked here for your reference: https://www.linkedin.com/blog/engineering/machine-learning/matching-linkedin-members-with-the-right-premium-products...

Aug 12, 202414 min

Measure Semantic Relevance in Search with Large Language Models (LLMs) [Faire]

In this episode, we will discuss why search relevance is important for Faire, how their data team quantifies semantic relevance in search, and how they leverage large language models (LLMs) to measure it efficiently. For more details, you can refer to their published tech blog, linked here for your reference: https://craft.faire.com/fine-tuning-llama3-to-measure-semantic-relevance-in-search-86a7b13c24ea...

Aug 05, 202414 min

Making Informed Decisions in A/B Tests with Multiple Metrics [Spotify]

In this episode, we will touch on the importance of A/B testing in the product decision-making process. We will share the four types of metrics and the decision-making framework used by the Data Science team at Spotify, as well as the necessary statistical adjustments that need to be incorporated into experimentation to ensure a solid statistical foundation. For more details, you can refer to their published tech blog, linked here for your reference: https://engineering.atspotify.com/2024/03/ris...

Jul 29, 202414 min

Improving ETA Predictions with Advanced Deep Learning Architecture [DoorDash]

In this episode, we will discuss the importance of Estimated Time of Arrival (ETA) for DoorDash and how the company enhanced its machine learning model through three key directions: upgrading from a tree-based model to a deep-learning architecture, adopting a multi-task modeling approach, and leveraging probabilistic models. For more details, you can refer to their published tech blog, linked here for your reference: https://doordash.engineering/2024/03/12/improving-etas-with-multi-task-models-d...

Jul 22, 202414 min

Forecasting with the balance of art and science [Meta]

In this episode, we will discuss the intricacy of balancing both the art and science aspects in forecasting. We will explore this through two key aspects: validation of forecasting and integrating product impact into forecasting, where combining the art and science can be crucial to enhancing forecasting performance. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@AnalyticsAtMeta/forecasting-meta-balancing-art-and-science-92526e1a...

Jul 15, 202415 min

Optimize Feature Selection with Generic Algorithm [JustEatTakeaway.com/Grubhub]

In this episode, we will discuss what is feature selection and how JustEatTakeaway.com leverages generic algorithms as one practice to to optimize their feature selection. For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/justeattakeaway-tech/optimising-feature-selection-with-genetic-algorithms-an-easy-to-use-python-script-dde44cc9c053...

Jul 08, 202415 min

Monitoring Mechanisms for Recommendation Systems [Tubi TV]

In this episode, we will discuss how to build an effective monitoring mechanism for a recommendation system. We will cover the basic components of various recommendation processes and explore how the Tubi TV Engineering team developed their monitoring flow. For more details, you can refer to their published tech blog, linked here for your reference: https://code.tubitv.com/how-to-monitor-a-recommender-system-6d720c922c90...

Jul 01, 202413 min

Determine Causal Effects through Adoptor Analysis [Walmart]

In this episode, we will introduce the concept of causal inference, and discuss how the data scientist team from Walmart determines Causal Effects when A/B Tests are Infeasible through Adopter Analysis. Based on their published tech blog, with the link provided here for your reference: https://medium.com/walmartglobaltech/how-to-determine-causal-effects-when-a-b-tests-are-infeasible-through-adopter-analysis-b06f2d51a633...

Jun 24, 202414 min

Measure Web Performance with Composite Metric [Indeed]

In this episode, we will discuss the importance of metrics for the business. We will share the journey of how Indeed started with a single metric to measure client-side performance, and eventually converged into a composite metric serving as a comprehensive measure Based on their published tech blog, with the link provided here for your reference: https://engineering.indeedblog.com/blog/2024/01/composite-web-performance-metric...

Jun 17, 202416 min

Developing Text-to-SQL Feature with Large Language Models (LLMs) [Pinterest]

In this episode, we will explore how Pinterest uses generative AI technology to develop a text-to-SQL feature for their data analytics team. We will examine the general architecture of their two iterations, regarding how the team has enhanced the AI product to support better a wider range of analytics use cases with improved productivity. Based on their published tech blog, with the link provided here for your reference: https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinter...

Jun 10, 202412 min

A/B Testing with Cluster Experimentation Under Strong Network Effects [Meta]

In this episode, we'll discuss what network effects are, how they introduce challenges in the standard A/B testing framework, and how the cluster experimentation method can be leveraged to address these challenges. We will also delve into the technical details of how clusters can be generated, and evaluated, and the associated trade-offs that need to be considered. Based on their published tech blog, with the link provided here for your reference: https://medium.com/@AnalyticsAtMeta/how-meta-tes...

Jun 03, 202415 min

Measuring Marketing Effectiveness with Geo-experimentation [Grammarly]

In this episode, we'll explore how the data science team from Grammarly developed their geo-experimentation to measure marketing effectiveness. We will cover about three components in designing an A/B testing experiment, as well as considerations regarding the opportunistic costs of the experimentation. Based on their published tech blog, with the link provided here for your reference: https://www.grammarly.com/blog/engineering/measuring-marketing-effectiveness-in-a-cookie-less-world/...

May 27, 202414 min

Monte Carlo Simulatoin for Sampled Success Metrics [Shopify]

In this episode, we'll explore how the data science team from Shopify leverages Monte Carlo Simulation to develop their sampled success metrics. We'll discuss what is sampled success metrics, the associated trade-offs needed to build them, and how Monte Carlo simulation can be used to inform decisions. Based on their published tech blog, with the link provided here for your reference: https://shopify.engineering/monte-carlo-simulations-sampled-success-metrics...

May 20, 202416 min
For the best experience, listen in Metacast app for iOS or Android