Data Archives - Software Engineering Daily - podcast cover

Data Archives - Software Engineering Daily

Data Archives - Software Engineering Dailysoftwareengineeringdaily.com
Databases and data engineering episodes of Software Engineering Daily
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

Scalable Streaming Video with Amit Mishra

The internet is a layer cake of technologies and protocols. At a fundamental level, the internet runs on the TCP/IP protocol. It’s a packet based system. When your browser requests a file from a web server, that server chops up the file into tiny pieces known as packets and puts them on the network labeled The post Scalable Streaming Video with Amit Mishra appeared first on Software Engineering Daily ....

Nov 10, 202136 min

Observability Using Honeycomb.io with Christine Yen

It does not matter if it runs on your machine. Your code must run in the production environment and it must do so performantly. For that, you need tooling to better understand your application’s behavior under different circumstances. In the earliest days of software development, all we had were logs, which are still around and The post Observability Using Honeycomb.io with Christine Yen appeared first on Software Engineering Daily ....

Nov 08, 202149 min

Location-Based Experiences Using Foursquare with Ankit Patel

The manner in which users interact with technology has rapidly switched to mobile consumption. The devices almost all of us carry with us at all times open endless opportunities for developers to create location-based experiences. Foursquare became a household name when the introduced social check-ins. Today they’re a location data platform. Ankit Patel is the The post Location-Based Experiences Using Foursquare with Ankit Patel appeared first on Software Engineering Daily ....

Nov 03, 202148 min

Datadog with Omri Sass and Hugo Kaczmarek

Modern business applications are complex. It’s not enough to have raw logs or some basic telemetry. Today’s enterprise organizations require an application performance monitoring solution or APM. Today’s applications are complex distributed systems whose performance depends on a wide variety of factors. Every single line of code can affect production and teams need insights into The post Datadog with Omri Sass and Hugo Kaczmarek appeared first on Software Engineering Daily ....

Oct 28, 202139 min

Infrastructure as Code with Christian Tragesser

Infrastructure as Code is an approach to machine provisioning and setup in which a programmer describes the underlying services they need for their projects. However, this infrastructure code doesn’t compile a binary artifact like traditional source code. The successful completion of running the code signals that the servers and other components described in the configuration The post Infrastructure as Code with Christian Tragesser appeared first on Software Engineering Daily ....

Oct 08, 202144 min

Modern Data Infrastructure and Tools with Leigh Marie Braswell

The first industrial deployments of machine learning and artificial intelligence solutions were bespoke by definition and often had brittle operating characteristics. Almost no one builds custom databases, web servers, or email clients. Yet technology groups today often consider developing homegrown ML and data solutions in order to solve their unique use cases. Today’s modern data The post Modern Data Infrastructure and Tools with Leigh Marie Braswell appeared first on Software Engineering Dail...

Oct 05, 202148 min

Git Scales for Monorepos with Derrick Stolee

In a version control system, a Monorepo is a version control management strategy in which all your code is contained in one potentially large but complete repository. The monorepo is in stark contrast to an alternative approach in which software teams independently manage microservices or deliver software as libraries to be imported in other projects. The post Git Scales for Monorepos with Derrick Stolee appeared first on Software Engineering Daily ....

Oct 01, 202154 min

Faking Data Using Tonic.ai with Ian Coe and Adam Kamor

Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset. Ideally, developers working on a software application wouldn’t need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable problems. Organizations need a solution to protect privacy The post Faking Data Using Tonic.ai with Ian Coe and Adam Kamor appeared first on Software Engine...

Sep 29, 202150 min

DBT: Data Build Tool with Tristan Handy

Applications write data to persistent storage like a database. The most popular database query language is SQL which has many similar dialects. SQL is expressive and powerful for describing what data you want. What you do with that data requires a solution in the form of a data pipeline. Ideally, these analytical workflows can follow The post DBT: Data Build Tool with Tristan Handy appeared first on Software Engineering Daily ....

Sep 28, 202145 min

No Code Process Automation at Axiom with Yaseer Sheriff

Tedious, repetitive tasks are better handled by machines. Unless these tasks truly require human intelligence, repetitive tasks are often good candidates for automation. Implementing process automation can be challenging and technical. Increasingly, engineers are seeking out tools and platforms to facilitate faster, more reliable automation. In this episode I talk to Yaseer Sheriff, Co-Founder and The post No Code Process Automation at Axiom with Yaseer Sheriff appeared first on Software Enginee...

Sep 24, 202144 min

LinearB with Dan Lines

A developer’s core deliverables are individual commits and the pull requests they aggregate into. While the number of lines of code written alone may not be very informative, in total, the code and metadata about the code found in tracking systems present a rich dataset with great promise for analysis and productivity optimization insights. LinearB The post LinearB with Dan Lines appeared first on Software Engineering Daily ....

Sep 21, 202146 min

Modern Data Stacks Optimized by Mozart Data with Peter Fishman and Dan Silberman

Modern companies leverage dozens or even hundreds of software solutions to solve specific needs of the business. Organizations need to collect all these disparate data sources into a data warehouse in order to add value. The raw data typically needs transformation before it can be analyzed. In many cases, companies develop homegrown solutions, thus reinventing The post Modern Data Stacks Optimized by Mozart Data with Peter Fishman and Dan Silberman appeared first on Software Engineering Daily ....

Sep 14, 202151 min

Instabase with Anant Bhardwaj

Instabase is a technology platform for building automation solutions. Users deploy it onto their own infrastructure and can leverage the tools offered by the platform to build complex workflows for handling tasks like income verification and claims processing. In this episode we interview Anant Bhardwaj, founder of Instabase. He describes Instabase as an operating system. The post Instabase with Anant Bhardwaj appeared first on Software Engineering Daily ....

Sep 07, 202148 min

InfluxData: Time-Series Data with Russ Savage

Time series data are simply measurements or events that are tracked, monitored, downsampled, and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market, and many other types of analytics data (influxdata.com). The platform InfluxData is designed for building and operating time series applications. The post InfluxData: Time-Series Data with Russ Savage appeared first on Software Engineering Daily ....

Aug 19, 202144 min

Druid: Event-Driven Data with Eric Tschetter

Whether sending messages, shopping in an app, or watching videos, modern consumers expect information and responsiveness to be near-instant in their apps and devices. From a developer’s perspective, this means clean code and a fast database. Apache Druid is a database built to power real-time analytic workloads for event-driven data, like user-facing applications, streaming, and The post Druid: Event-Driven Data with Eric Tschetter appeared first on Software Engineering Daily ....

Aug 16, 202156 min

DaaS with Auren Hoffman

Auren Hoffman is the CEO of SafeGraph. In this episode we discuss data as a service and more. This interview was also recorded as a video podcast. Check out the video on the Software Daily YouTube channel. Sponsorship inquiries: sponsor@softwareengineeringdaily.com The post DaaS with Auren Hoffman appeared first on Software Engineering Daily .

Aug 13, 20211 hr 48 min

Reverse ETL: Operationalizing Data Warehouses with Tejas Manohar

Enterprise data warehouses store all company data in a single place to be accessed, queried, and analyzed. They’re essential for business operations because they support managing data from multiple sources, providing context, and have built-in analytics tools. While keeping a single source of truth is important, easily moving data from the warehouse to other applications The post Reverse ETL: Operationalizing Data Warehouses with Tejas Manohar appeared first on Software Engineering Daily ....

Aug 02, 202154 min

Prophecy: Apple of Data Engineering with Raj Bains

Prophecy is a complete Low-Code Data Engineering Platform for the Enterprise. Prophecy enables all your teams on Apache Spark with a unique low-code designer. While you visually build your Dataflows – Prophecy generates high-quality Spark code on Git. Then, you can schedule Spark workflows with Prophecy’s low-code Airflow. Not only that, Prophecy provides end-to-end visibility The post Prophecy: Apple of Data Engineering with Raj Bains appeared first on Software Engineering Daily ....

Jul 28, 202158 min

Pulsar Rerevisted with Enrico Olivelli

In the previous episode, Pulsar Revisited, we discussed how the company DataStax has added to their product stack Astra Streaming, their cloud-native messaging and event streaming service that’s built on top of Apache Pulsar. We discussed Apache Pulsar and the added features DataStax offers like injecting machine learning into your data streams and viewing real-time The post Pulsar Rerevisted with Enrico Olivelli appeared first on Software Engineering Daily ....

Jul 26, 202156 min

CockroachDB: Distributed Databases and Containerization with Spencer Kimball

In 2003, Google developed a robust cluster management system called Borg. This enabled them to manage clusters with tens of thousands of machines, moving them away from virtual machines and firmly into container management. Then, in 2014, they open sourced a version of Borg called Kubernetes, or K8s. Now, in 2021, CockroachDB is a distributed The post CockroachDB: Distributed Databases and Containerization with Spencer Kimball appeared first on Software Engineering Daily ....

Jul 21, 202152 min

Imply Infra: Big Data Analysis and Real-World Examples with Jad Naous

Big data analytics is the process of collecting data, processing and cleaning it, then analyzing it with techniques like data mining, predictive analytics, and deep learning. This process requires a suite of tools to operate efficiently. Data analytics can save companies money, drive product development, and give insight into the market and customers. The company The post Imply Infra: Big Data Analysis and Real-World Examples with Jad Naous appeared first on Software Engineering Daily ....

Jul 19, 202145 min

Better Stack: A New DevOps Experience with Juraj Masar

DevOps has shortened the development life cycle for countless applications and is embraced by companies around the world. But managing and monitoring multiple environments is still a major pain point, particularly when companies need to mix cloud and legacy systems. Knowing when services go down and quickly pinpointing the cause is essential for continuous development. The post Better Stack: A New DevOps Experience with Juraj Masar appeared first on Software Engineering Daily ....

Jul 15, 202152 min

Data Science on AWS: Implementing AI and ML Pipelines on AWS with Chris Fregly

Data science is an interdisciplinary field that combines strong technical skills with industry knowledge to perform a large range of jobs. Data scientists solve business questions with hands-on work cleaning and analyzing data, building machine learning models and applying algorithms, and generating dynamic visuals and tools to understand the world from the data it generates. The post Data Science on AWS: Implementing AI and ML Pipelines on AWS with Chris Fregly appeared first on Software Engine...

Jul 14, 202147 min

Data Lineage: Understanding Data Lineage at Scale with Julien Le Dem

Big Data has exploded the past decade as cloud computing and more efficient hardware made scaling essentially limitless. Products like Uber revolve entirely around analyzing data to provide rides. According to an EMC/IDC study, there was approximately 5.2TB of data for every person in 2020. That estimate was made before the transition to remote work, The post Data Lineage: Understanding Data Lineage at Scale with Julien Le Dem appeared first on Software Engineering Daily ....

Jul 12, 202159 min

Text Blaze: Text Shortcuts with Scott Fortmann-Roe

There are over 4 billion people using email. Many people using email for business communicate quick questions to colleagues, send repetitive, template-based information to potential customers and freshly hired employees, and repeat a lot of the same phrases. We actually repeat phrases in a lot of written formats. How often do you copy and paste The post Text Blaze: Text Shortcuts with Scott Fortmann-Roe appeared first on Software Engineering Daily ....

Jul 03, 202146 min

LayerCI with Colin Chartier

Continuous integration is a coding practice where engineers deliver incremental and frequent code changes to create higher quality software and collaborate more. Teams attempting to continuously integrate new code need a consistent and automated pipeline for reviewing, testing, and deploying the changes. Otherwise change requests pile up in the queue and nothing gets integrated efficiently. The post LayerCI with Colin Chartier appeared first on Software Engineering Daily ....

Jul 02, 202150 min

Meltano: ELT for DataOps with Douwe Maan

ELT is a process for copying data from a source system into a target system. It stands for “Extract, Load, Transform” and starts with extracting a copy of data from the source location. It’s loaded into the target system like a data warehouse, and then it’s ready to be transformed into a usable format for The post Meltano: ELT for DataOps with Douwe Maan appeared first on Software Engineering Daily ....

Jul 01, 202152 min

Uber Data Science with Kevin Novak

Uber is one of many examples we’ve discussed on this show that has changed the world with big data analysis. With over 8 million users, 1 billion Uber trips and people driving for Uber in over 400 cities and 66 countries, Uber has redefined an entire industry in a very short time frame. It’s difficult The post Uber Data Science with Kevin Novak appeared first on Software Engineering Daily ....

Jun 24, 202150 min

Axiom Browser Automation with Yaseer Sheriff

The quantity and quality of a company’s data can mean the difference between a major success or major failure. Companies like Google have used big data from its earliest days to steer their product suite in the direction consumers need. Other companies, like Apple, didn’t always use big data analytics to drive product design, but The post Axiom Browser Automation with Yaseer Sheriff appeared first on Software Engineering Daily ....

Jun 23, 202138 min

StreamSets: DataOps and Smart Pipelines with Arvind Prabhakar

The company StreamSets is enabling DataOps practices in today’s enterprises. StreamSets is a data engineering platform designed to help engineers design, deploy, and operate smart data pipelines. StreamSets Data Collector is a codeless solution for designing pipelines, triggering CDC operations, and monitoring data in flight. StreamSets Transformer uses Apache Spark to generate insights about your The post StreamSets: DataOps and Smart Pipelines with Arvind Prabhakar appeared first on Software E...

Jun 17, 202156 min
For the best experience, listen in Metacast app for iOS or Android