Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows - podcast episode cover

Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows

Jan 19, 20221 hr 6 minSeason 1Ep. 111
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

https://go.dok.community/slack
https://dok.community/

ABSTRACT OF THE TALK

Complex computational workloads in Python are a common sight these days, especially in the context of processing large and complex datasets. Battle-hardened modules such as Numpy, Pandas, and Scikit-Learn can perform low-level tasks, while tools like Dask makes it easy to parallelize these workloads across distributed computational environments. Meanwhile, Argo Workflows offers a Kubernetes-native solution to provisioning cloud resources in Kubernetes and triggering workflows on a regular schedule. Being Kubernetes-native, Argo Workflows also meshes nicely with other Kubernetes tools. This talk discusses the combination of these two worlds by showcasing a set-up for Argo-managed workflows which schedule and automatically scale-out Dask-powered data pipelines in Python.

BIO

Former academic in the field of renewable energy simulation and energy systems analysis. Currently responsible for architecting and maintaining the cloud- and data strategy at ACCURE Battery Intelligence

KEY TAKE-AWAYS FROM THE TALK

Argo Workflows + Dask is a nice combination for data-processing pipelines. There are a a few "gotchyas" to be on the look-out for, but in nevertheless this is still a generally-applicable and powerful combination.

https://github.com/sevberg

For the best experience, listen in Metacast app for iOS or Android