Building and managing data-intensive applications has traditionally been costly and complex, and has placed an operational burden on developers to maintain as their organization scales. Todays’ developers, data scientists, and data engineers need a streamlined, single cloud data platform for building applications, pipelines, and machine learning models — without having to move or copy their The post Building on the Data Cloud with Torsten Grabs appeared first on Software Engineering Daily ....
Nov 07, 2022•40 min
Data analytics technology and tools have seen significant improvements in the past decade. But, it can still take weeks to prototype, build and deploy new transformations and deployments, usually requiring considerable engineering resources. Plus, most data isn’t real-time. Instead, most of it is still batch-processed. Tinybird Analytics provides an easy way to ingest and query The post Serverless Clickhouse for Developers with Jorge Sancha appeared first on Software Engineering Daily ....
Sep 12, 2022•35 min
Data is becoming a bank’s biggest asset. These complex enterprises have a huge opportunity ahead – to transform themselves to become a trusted hub of a much broader data ecosystem that goes beyond the financial industry and helps to form a new class of cross-industry experience architectures that are scalable and transparent. The data physics The post Data Infrastructure for Finance appeared first on Software Engineering Daily ....
Aug 18, 2022•54 min
Ian Coe CEO Adam Kamor Head of Engineering Companies that gather data about their users have an ethical obligation and legal responsibility to protect the personally identifiable information in their dataset. Ideally, developers working on a software application wouldn’t need access to production data. Yet without high-quality example data, many technology groups stumble on avoidable The post Faking Data Using Tonic.ai with Ian Coe and Adam Kamor appeared first on Software Engineering Daily ....
Aug 05, 2022•47 min
Couchbase is a distributed NoSQL cloud database. Since its creation, Couchbase has expanded into edge computing, application services, and most recently, a database-as-a-service called Capella. Couchbase started as an in-memory cache and needed to be rearchitected to be a persistent storage system. In this episode, We interviewed Ravi Mayuram, SVP Products, and Engineering at Couchbase. The post Couchbase with Ravi Mayuram appeared first on Software Engineering Daily ....
Jul 28, 2022•30 min
Streaming data platforms like Kafka, Pulsar, and Kinesis are now common in mainstream enterprise architectures, providing low-latency real-time messaging for analytics and applications. However, stream processing – the act of filtering, transforming, or analyzing the data inside the messages – is still an exercise left to the receiving microservice or datastore, a custom programming exercise The post Decodable Streaming with Eric Sammer appeared first on Software Engineering Daily ....
Jun 01, 2022•45 min
Data-as-a-service is a company category type that is not as common as API-as-a-service, software-as-a-service, or platform-as-a-service. In order to vend data, a data-as-a-service provider needs to define how that data will be priced, stored, and delivered to users: streaming over an API or served via static files. Naqeeb Memon of Safegraph joins the show The post Data Delivery with Naqeeb Memon appeared first on Software Engineering Daily ....
May 14, 2022•28 min
Data labeling allows machine learning algorithms to find patterns among the data. There are a variety of data labeling platforms that enable humans to apply labels to this data and ready it for algorithms. Heartex is a data labeling platform with an open source core. Michael Malyuk joins the show to talk through the platform The post Data Labeling with Michael Malyuk appeared first on Software Engineering Daily ....
May 11, 2022•42 min
Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily ....
May 09, 2022•44 min
Data loss can occur when large data sources such as Slack or Google Drive get leaked. In order to detect and avoid leaks, a data asset graph can be built to understand the risks of a company environment. Polymer is a data loss prevention product that helps companies avoid problematic data leaks. Yasir Ali is The post Data Loss Prevention with Yasir Ali appeared first on Software Engineering Daily ....
Apr 29, 2022•41 min
Data integration infrastructure is not easy to build. Moving large amounts of data from one place to another has historically required developers to build ad hoc integration points to move data between SaaS services, data lakes, and data warehouses. Today, there are dedicated systems and services for moving these large batches of data. Airbyte builds The post Airbyte Engineering with Michel Tricot appeared first on Software Engineering Daily ....
Apr 27, 2022•42 min
Modern organizations eventually face data governance challenges. Keeping track of where data came from, what systems update it, in what ways updates can be made are just some of the issues to be tackled. Large organizations face additional challenges like training, onboarding, and capturing the institutional knowledge that leaves with the departure of key team The post Select Star with Shinji Kim appeared first on Software Engineering Daily ....
Apr 25, 2022•43 min
The solution many turn to for capturing their streaming data is InfluxDB. In this episode, I interview Brian Gilmore, Director of Product Management at InfluxData, about how real time applications achieve success built on top of InfluxDB. When most people hear the phrase Internet of Things, it typically evokes an image of connected devices we The post Time Series IoT on InfluxDB with Brian Gilmore appeared first on Software Engineering Daily ....
Apr 14, 2022•49 min
Lior Gavish James Densmore Data infrastructure is a fast-moving sector of the software market. As the volume of data has increased, so too has the quality of tooling to support data management and data engineering. In today’s show, we have a guest from a data intensive company as well as a company that builds a The post Data Engineering Trends with Lior Gavish and James Densmore appeared first on Software Engineering Daily ....
Apr 05, 2022•44 min
Running a database company requires expertise in both technical and managerial skills. There are deeply technical engineering questions around query paths, scalability, and distributed systems. And there are complex managerial questions around developer productivity and task allocation. Sam Lambert is the CEO of PlanetScale, which is building modern relational database infrastructure. Before PlanetScale he spent The post PlanetScale Management with Sam Lambert appeared first on Software Engineer...
Mar 31, 2022•49 min
SingleStore is a multi-use, multi-model database designed for transactional and analytic workloads, as well as search and other domain specific applications. SingleStore is the evolution of the database company MemSQL, which sought to bring fast, in-memory SQL database technology to market. Jordan Tigani is Chief Product Officer of SingleStore and joins the show to talk The post SingleStore with Jordan Tigani appeared first on Software Engineering Daily ....
Mar 29, 2022•43 min
DuckDB is a relational database management system with no external dependencies, with a simple system for deployment and integration into build processes. It enables complex queries in SQL with a large function library, and provides transactional guarantees through multi-version concurrency control. Hannes Mühleisen works on DuckDB and joins the show to talk about query engines The post DuckDB with Hannes Muleisen appeared first on Software Engineering Daily ....
Mar 19, 2022•49 min
Customer data pipelines power the backend of many successful web platforms. In a customer data pipeline, data is collected from sources such as mobile apps and cloud SaaS tools, transformed and munged using data engineering, stored in data warehouses, and piped to analytics, advertising platforms, and data infrastructure. RudderStack is an open source customer data The post RudderStack Engineering with Soumaydeb Mitra appeared first on Software Engineering Daily ....
Mar 16, 2022•47 min
The data lake architecture has become broadly adopted in a relatively short period of time. In a nutshell, that means data in it’s raw format stored in cloud object storage. Modern software and data engineers have no shortage of options for accessing their data lake, but that list shrinks quickly if you care about features The post Apache Hudi with Vinoth Chandar appeared first on Software Engineering Daily ....
Mar 09, 2022•43 min
A data catalog provides an index into the data sets and schemas of a company. Data teams are growing in size, and more companies than ever have a data team, so the market for data catalog is larger than ever. Mark is the CEO of Stemma and the co-creator of Amundsen, a data catalog that came out of The post Data Catalog in Practice with Mark Grover appeared first on Software Engineering Daily ....
Feb 25, 2022•52 min
Splunk is a monitoring and logging platform that has evolved over its 18 years of existence. In its modern focus on observability it is focused on open source and AIOps. Observability has evolved with the growth of Kubernetes, and Splunk’s work around OpenTelemetry has kept parity with the open source community of Kubernetes. Spiros Xanthos The post Splunk Platform with Spiros Xanthos appeared first on Software Engineering Daily ....
Feb 23, 2022•44 min
Barry McCardel Co-Founder and CEO at Hex Caitlin Colgrove Co-Founder and CTO at Hex In contrast to other IDEs, the notebook interface offers software developers a unique environment idealized for data professionals. Despite the growth in popularity, a surprising learning curve still exists for setup and configuration. A siloed notebook offers no native collaboration tools. The post Hex Collaborative Data Workspace with Barry McCardel and Caitlin Colgrove appeared first on Software Engineering Da...
Feb 18, 2022•45 min
When writing code, test driven development is a common accepted methodology to ensure the development of high quality software. Your organization’s data, on the other hand, is an entirely different challenge. Data can be missing due to human error, a failure with a 3rd party provider, a botched release, or dozens of other issues. When The post Data Quality Using Anomalo with Jeremy Stanley appeared first on Software Engineering Daily ....
Feb 17, 2022•47 min
Database product companies typically have a few phases. First, the company will develop a technology with some kind of innovation such as speed, scalability, or durability. The company will offer support contracts around that technology for a period of time, before eventually building a managed, hosted offering. PlanetScale is a database company built around the The post Scaling PlanetScale with Sugu Sougoumarane appeared first on Software Engineering Daily ....
Jan 31, 2022•48 min
Couchbase is a distributed NoSQL cloud database. Since its creation, Couchbase has expanded into edge computing, application services, and most recently a database-as-a-service called Capella. Couchbase started as an in-memory cache and needed to be rearchitected to be a persistent storage system. In this episode, I interview Ravi Mayuram, SVP Products and Engineering at Couchbase The post Couchbase Architecture with Ravi Mayuram appeared first on Software Engineering Daily ....
Jan 28, 2022•59 min
If you haven’t encountered a data quality problem, then you haven’t yet worked on a large enough project. Invariably, a gap exists between the state of raw data and what an analyst or machine learning engineer needs to solve their problem. Many organizations needing to automate data preparation workflows look to Trifacta as a solution. The post Trifacta with Joe Hellerstein appeared first on Software Engineering Daily ....
Dec 21, 2021•41 min
Relational databases have been a fixture of software applications for decades. They are highly tuned for performance and typically offer explicit guarantees like transactional consistency. More recently, there’s been a figurative cambrian explosion of other-than-relational databases. Simple key value stores or counters were an early win in this space. Managing a graph data structure is The post MemGraph with Dominik Tomicevic appeared first on Software Engineering Daily ....
Dec 10, 2021•43 min
The lifeblood of most companies is their sales departments. When you’re selling something other than a commodity, it’s typically necessary to carefully groom the onboarding experience for inbound future customers. Historically, companies approached this in a one-size-fits-all manner, giving all customers a common experience. In today’s data-driven age, a better experience can be provided that The post Amplemarket with João Batalha appeared first on Software Engineering Daily ....
Dec 09, 2021•39 min
Application observability is a fairly mature area. Engineering teams have a wide selection of tools they can choose to adopt and a significant amount of thought leadership and philosophy already exists giving guidance for managing your application. That application is going to persist data. As you scale up, your system is invariably going to experience The post Metaplane with Kevin Hu appeared first on Software Engineering Daily ....
Nov 24, 2021•44 min
Consumers are increasingly becoming aware of how detrimental it can be when companies mismanage data. This demand has fueled regulations, defined standards, and applied pressure to companies. Modern enterprises need to consider corporate risk management and regulatory compliance. In this interview, I speak with Terry O’Daniel, Director of Engineering (Risk & Compliance) at Instacart. Sponsorship The post Risk and Compliance with Terry O’Daniel appeared first on Software Engineering Daily ....
Nov 23, 2021•58 min