MLOps Coffee Sessions #14 Conversation with the Creators of Dask // Hugo Bowne-Anderson and Matthew Rocklin - podcast episode cover

MLOps Coffee Sessions #14 Conversation with the Creators of Dask // Hugo Bowne-Anderson and Matthew Rocklin

Oct 12, 202057 minSeason 1Ep. 14
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Join the Community: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://go.mlops.community/YTJoinIn⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

Get the newsletter: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://go.mlops.community/YTNewsletter⁠⁠


Dask
What is it?
Parallelism for analytics
What is parallelism?
Doing a lot at once by splitting tasks into smaller subtasks, which can be processed in parallel (at the same time)
Distributed work across multiple machines and then combined the results
Helpful for CPU-bound - doing a bunch of calculations on the CPU. The rate at which the process progresses is limited by the speed of the CPU
Concurrency?
Similar a but things don’t have to happen at the same time, they can happen asynchronously. They can overlap.
Shared state
Helpful to I/O bound - networking, reading from disk, etc. The rate at which a process progresses is limited by the speed of the I/O subsystem.
Multi-core vs distributed
Multi-core is a single processor with 2 or more cores that can cooperate through threads - multithreading
Distributed across multiple nodes communicating via HTTP or RPC. Why is this hard?
Python has its challenges due to GIL; other languages don't have this problem
Shared state can lead to potential race conditions, deadlocks, etc
Coordinate work across the machines
For analytics?
Calculating some statistics on a large dataset can be tricky if it can’t fit in memory


// Show Notes

Coiled Cloud: https://cloud.coiled.io/
Coiled Launch Announcement: https://medium.com/coiled-hq/coiled-dask-for-everyone-everywhere-376f5de0eff4
OSS article: https://www.forbes.com/sites/glennsolomon/2020/09/15/monetizing-open-source-business-models-that-generate-billions/#2862e47234fd
Amish barn raising: https://www.youtube.com/watch?v=y1CPO4R8o5M
MessagePassingInterface: https://en.wikipedia.org/wiki/Message_Passing_Interface


----------- Connect With Us ✌️-------------

Join our Slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register


Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with David on LinkedIn: https://www.linkedin.com/in/aponteanalytics/
Connect with Matthew on LinkedIn: https://www.linkedin.com/in/matthew-rocklin-461b4323/

Timestamps:
0:00 - Intro to Matthew Rocklin and Hugo Bowne-Anderson
0:37 - Matthew Rocklin's Background
1:17 - Hugo Brown-Anderson's Background
3:47 - Where did that inspiration come from?
10:04 - Is there a close relationship between Best Practices and Tooling, or are these two separate things?
11:27 - Why is Data Literacy important with Coiled?
14:46 - How do you think about the balance between enabling Data Science to have a lot of powerful compute?
17:05 - Machine Learning as a space for tracking best practices experimentation
19:32 - What makes Data Science so difficult?  
24:07 - How can a for-profit company complement Open Source Software (OSS)
29:40 - Amazon becoming a competitor with your own open-source technology (?)
32:50 - How do you encourage more people to contribute and ensure quality?
34:58 - Do you see Coiled operating within the DASK ecosystem?
37:30 - What is DASK?
39:19 - What should people know about parallelism?
41:28 - Why is it so hard to put things back together?
41:34 - Why does Python need a whole new tool to enable that? Or maybe some other tools as well?
44:44 - Dynamic Tasks Scheduling as being useful to Data Scientists
47:15 - Why is reliability in particular important in Data Science?
52:27 - What's in store for DASK?

For the best experience, listen in Metacast app for iOS or Android