034. Web Scraping and Data Science - podcast episode cover

034. Web Scraping and Data Science

Jan 21, 202251 minSeason 1Ep. 35
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Data collection is a crucial step for any data related projects. So much so that you might have encountered something along the lines of the “GIGO” (garbage in, garbage out) concept. Some might even say having the right data is more important than having tons of data that can’t be used.

As web scraping being one of the ways to collect data, for this episode, we invited Cliff, a data consultant, back to discuss his personal experience with web scraping. He shared topics such as the basics of web scraping, web scraping tools, the challenges that he faced while trying to scrape web contents, ethics of web scraping, learning materials, and more!

Resources:

  1. Cliff's medium post 1: https://medium.com/codex/scraping-singapore-libraries-f74c541f1f94
  2. Cliff's medium post 2: https://cliffy-gardens.medium.com/iterations-for-my-nlb-scraper-github-code-provided-b4e1f1bd422e
  3. Selenium: https://www.selenium.dev/
  4. BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
  5. TagUI: https://github.com/kelaberetiv/TagUI
  6. Web Scraping with Python: https://www.oreilly.com/library/view/web-scraping-with/9781491985564/
For the best experience, listen in Metacast app for iOS or Android