45: Trino swimming with the DolphinScheduler - podcast episode cover

45: Trino swimming with the DolphinScheduler

Mar 20, 20231 hr 55 minEp. 45
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

DolphinScheduler is a popular Apache data workflow orchestrator that enables running complex data pipelines. They recently added a Trino integration and will be demonstrating how to use DolphinScheduler to enable a series of transformations on the data lakehouse with Trino.

- Intro Music: 0:00

- Intro: 0:31

- Trino release 407: 13:22

- What is workflow orchestration?: 21:12

- Why do we need a workflow orchestration tool for building a data lake?: 31:07

- What is Apache DolphinScheduler?: 37:35

- Does DolphinScheduler have any computing engine or storage layer?: 53:11

- What are the differences with other workflow orchestration, such as Apache Airflow?: 58:46

- Demo: Creating a simple Trino workflow in DolphinScheduler: 1:26:44

- PR: Improve performance of Parquet files: 1:47:04

Show Notes: https://trino.io/episodes/45

Show Page: https://trino.io/broadcast/

For the best experience, listen in Metacast app for iOS or Android