** The intersection between DataOps and privacy** DataOps is considered by many as the new era of data management, a set of principles that emphasizes communication, collaboration, integration, and automation of cooperation between the different teams in an organization that have to deal with data: data engineers, data scientists to data analysts. But is there any relation between DataOps and data privacy protection? Can organizations leverage DataOps to ensure that their data is privacy c...
Dec 03, 2020•33 min•Season 1Ep. 5
**Are Privacy Enhancing Technologies a myth** Data Privacy and machine learning are here to stay, and there’s no doubt they’re the hot trends to be following. But do they need to clash with each other? Can we have these titans to co-exist? It seems like finally 2020 and 2021 will be the years where Privacy Enhancing Technologies. But after all what are they? How are these techs being used and leveraged by organizations? Useful links: https://medium.com/@francis_49362/differential-privacy-not-a-c...
Nov 26, 2020•23 min•Season 1Ep. 4
Coffee Sessions #19 with Barr Moses of Monte Carlo, Introducing Data Downtime: How to Prevent Broken Data Pipelines with Observability co-hosted by Vishnu Rachakonda //Bio Barr Moses is CEO & Co-Founder of Monte Carlo, a data observability company backed by Accel and other top Silicon Valley investors. Previously, she was VP Customer Operations at customer success company Gainsight, where she helped scale the company 10x in revenue and among other functions, built the data/analytics team. Pr...
Nov 24, 2020•1 hr 1 min•Season 1Ep. 19
MLOps community meetup #43! Last Wednesday, we talked to Nathan Benaich, General Partner at Air Street Capital and Timothy Chen, Managing Partner at Essence VC about The MLOps Landscape. // Abstract: In this session, we explored the MLOps landscape through the eyes of two accomplished investors. Tim And Nathan shared with us their experience in looking at hundreds of ML and MLOps companies each year to highlight major insights they have gained. What do the ML infrastructure and tooling landscape...
Nov 23, 2020•59 min•Season 1Ep. 43
**AI and ethical dilemmas** Artificial Intelligence is seen by many as a vehicle for great transformation, but for others, it still remains a mystery, and many questions remain unanswered: will AI systems rule us one day? Can we trust AI to rule our criminal systems? Maybe create political campaigns and dominate political advertisements? Or maybe something less harmful, do our laundry? Some of these questions may sound absurd, but they are for sure making people shift from thinking purely about ...
Nov 19, 2020•52 min•Season 1Ep. 3
MLOps community meetup #42! Last Wednesday, we talked to Mark Craddock, Co-Founder & CTO, Global Certification and Training Ltd (GCATI), about UN Global Platform. // Abstract: Building a global big data platform for the UN. Streaming 600,000,000+ records / day into the platform. The strategy developed using Wardley Maps and the Platform Design Toolkit. // Bio: Mark contributed to the Cloud First policy for the UK Public sector and was one of the founding architects for the UK Governments G-C...
Nov 16, 2020•59 min•Season 1Ep. 43
What are regulations saying about data privacy? We are already aware of the importance of using Machine Learning to improve businesses, nevertheless to feed Machine Learning, data is a must, and in many cases, this data might even be considered sensitive information. So, does this mean that with new privacy regulations, access to data will be more and more difficult? ML and Data Science have their days counted? Or Will Machine beat privacy? To answer all these questions I’ve invited Cat Coode, a...
Nov 12, 2020•36 min•Season 1Ep. 2
In this episode, we talked to Elizabeth Chabot, Consultant at Deloitte, about When You Say Data Scientist Do You Mean Data Engineer? Lessons Learned From StartUp Life. // Key takeaways: If you have a data product that you want to function in production, you need MLOps Education needs to happen about the data product life cycle, noting that ML is just part of the equation Titles need to be defined to help outside users understand the differences in roles // Abstract: ML and AI...
Nov 10, 2020•1 hr 1 min•Season 1Ep. 42
MLOps community meetup #41! Last Wednesday was an exciting episode that some attendees couldn't help to ask when is the next season of their favorite series! The conversation was around Metaflow: Supercharging Data Scientist Productivity with none other than Netflix’s very own Ravi Kiran Chirravuri. // Abstract: Netflix's unique culture affords its data scientists an extraordinary amount of freedom. They are expected to build, deploy, and operate large machine learning workflows autonomously wit...
Nov 10, 2020•1 hr•Season 1Ep. 41
Coffee Sessions #18 with Luigi Patruno of ML in Production, a Centralized Repository of Best Practices Summary Luigi Patruno and ML in production MLOps workflow: Knowledge sharing and best practices Objective: learn! Links: ML in production: https://mlinproduction.com/ Why you start MLinProduction: https://mlinproduction.com/why-i-started-mlinproduction/ Luigi Patruno: a man whose goal is to help data scientists, ML engineers, and AI product managers, build and operate machine learning sys...
Nov 09, 2020•47 min•Season 1Ep. 18
This is the first episode of a podcast series on Machine Learning and Data privacy. Machine Learning is the key to the new revolution in many industries. Nevertheless, ML does not exist without data and a lot of it, which in many cases results in the use of sensitive information. With new privacy regulations, access to data is today harder and much more difficult but, does that mean that ML and Data Science has its days counted? Will the Machines beat privacy? Don’t forget to subscr...
Nov 05, 2020•19 min•Season 1Ep. 1
MLOps level 2: CI/CD pipeline automation For a rapid and reliable update of the pipelines in production, you need a robust automated CI/CD system. This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment. Figure 4. CI/CD and automated ML pipeline. This MLOps setup includes ...
Nov 03, 2020•1 hr 1 min•Season 1Ep. 17
MLOps community meetup #40! Last Wednesday, we talked to Theofilos Papapanagiotou, Data Science Architect at Prosus, about Hands-on Serving Models Using KFserving. // Abstract: We looked to some popular model formats like the SavedModel of Tensorflow, the Model Archiver of PyTorch, pickle&ONNX, to understand how the weights of the NN are saved there, the graph, and the signature concepts. We discussed the relevant resources of the deployment stack of Istio (the Ingress gateway, the sidecar a...
Oct 30, 2020•58 min•Season 1Ep. 40
MLOps community meetup #39! Last week we talked to Ivan Nardini, Customer Engineer at SAS, about Operationalize Open Source Models with SAS Open Model Manager. // Abstract: Analytics are Open. According to their nature, Open Source technologies allows an agile development of the models, but it results difficult to put them in production. The goal of SAS is supporting customers in operationalize analytics In this meetup, I present SAS Open Model Manager, a containerized ...
Oct 27, 2020•57 min•Season 1Ep. 38
//Bio Satish built compilers, profilers, IDEs, and other dev tools for over a decade. At Microsoft Research, he saw his colleagues solving hard program analysis problems using Machine Learning. That is when he got curious and started learning. His approach to ML is influenced by his software engineering background of building things for production. He has a keen interest in doing ML in production, which is a lot more than training and tuning the models. The first step is to understan...
Oct 26, 2020•57 min•Season 1Ep. 16
James Sutton is an ML Engineer focused on helping enterprise bridge the gap between what they have now, and where they need to be to enable production scale ML deployments. ----------- Connect With Us ✌️------------- Join our Slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with David on LinkedIn: https://www.lin...
Oct 20, 2020•1 hr 2 min•Season 1Ep. 15
Parallel Computing with Dask and Coiled Python makes data science and machine learning accessible to millions of people around the world. However, historically Python hasn't handled parallel computing well, which leads to issues as researchers try to tackle problems on increasingly large datasets. Dask is an open source Python library that enables the existing Python data science stack (Numpy, Pandas, Scikit-Learn, Jupyter, ...) with parallel and distributed computing. Today Dask has been ...
Oct 19, 2020•57 min•Season 1Ep. 37
This time we talked about one of the most vibrant questions for any MLOps practitioner: how to choose the right tools for your ML team, given the huge amount of open-source and proprietary MLOps tools available on the market today. We discussed several criteria to rely on when choosing a tool, including: - The requirements of the particular team use-cases - The scaling capacity of the tool - The cost of migration from a chosen tool - The cost of teaching the team to use this tool - The com...
Oct 18, 2020•1 hr 1 min•Season 1Ep. 13
Dask What is it? Parallelism for analytics What is parallelism? Doing a lot at once by splitting tasks into smaller subtasks which can be processed in parallel (at the same time) Distributed work across multiple machines and then combining the results Helpful for CPU bound - doing a bunch of calculations on the CPU. The rate at which process progresses is limited by the speed of the CPU Concurrency? Similar but a but things don’t have to happen at the same time, they can happen asynchronously. T...
Oct 12, 2020•57 min•Season 1Ep. 14
Why was Flyte built at Lyft? What sorts of requirements does a ML infrastructure team have at lyft? What problems does it solve / use cases? Where does it fit in in the ML and Data ecosystem? What is the vision? Who should consider using it? Learnings as the engineering team tried to bootstrap an open-source community. Ketan Umare is a senior staff software engineer at Lyft responsible for technical direction of the Machine Learning Platform and is a founder of the Flyte project. Before Flyte he...
Oct 10, 2020•1 hr 5 min•Season 1Ep. 12
Round 3 analyzing the Google paper "Continuous Delivery and Automation Pipelines in ML" // Show Notes Data Science Steps for ML Data extraction: You select and integrate the relevant data from various data sources for the ML task. Data analysis: You perform exploratory data analysis (EDA) to understand the available data for building the ML model. This process leads to the following: Understanding the data schema and characteristics that are expected by the model. Identifying the data preparatio...
Oct 04, 2020•1 hr 6 min•Season 1Ep. 11
MLOps community meetup #36! This week we talk to David Hershey Solutions Engineer at Determined AI, about Moving Deep Learning from Research to Production with Determined and Kubeflow. // Key takeaways: What components are needed to do inference in ML How to structure models for ML inference How a model registry helps organize your models for easy consumption How you can set up reusable and easy-to-upgrade inference pipelines // Abstract: Translating the research that goes into creating a ...
Oct 04, 2020•56 min•Season 1Ep. 36
Second installation David and Demetrios reviewing the google paper about Continuous training and automated pipelines. They dive deep into machine learning monitoring and also what exactly continuous training actually entails. Some key highlights are: Automatically retraining and serving the models: When to do it? Outlier detection Drift detection Outlier detection: What is it? How you deal with it Drift detection Individual features may start to drift. This could be a bug or it could be perfectl...
Sep 22, 2020•1 hr 8 min•Season 1Ep. 10
MLOps Meetup #34! This week we talk to Kai Waehner about the beast that is apache kafka and how many different ways you can use it! // Key takeaways: -Kafka is much more than just messaging -Kafka is the de facto standard for processing huge volumes of data at scale in real-time -Kafka and Machine Learning are complementary for various use cases (including data integration, data processing, model training, model scoring, and monitoring) // Abstract: The combination of Apache Kafka, tiered storag...
Sep 17, 2020•53 min•Season 1Ep. 35
While machine learning is spreading like wildfire, very little attention has been paid to the ways that it can go wrong when moving from development to production. Even when models work perfectly, they can be attacked and/or degrade quickly if the data changes. Having a well understood MLOps process is necessary for ML security! Using Kubeflow, we demonstrated how to the common ways machine learning workflows go wrong, and how to mitigate them using MLOps pipelines to provide reproducibility, va...
Sep 14, 2020•56 min•Season 1Ep. 34
In this last episode, we covered how Google is thinking about MLOps and how automation plays a key part in their view of MLOps. We started to talk about CI, CD, and the role they play in a pipeline setup for CT. In the next episode, we'll pick up where we left off, starting our discussion of CT and some of the reasons you’d want to set up a pipeline with continuous training in the first place. Join our slack community: https://join.slack.com/t/mlops-community/shared_invite/zt-391hcpnl-aSwNf_X5Ry...
Sep 14, 2020•59 min•Season 1Ep. 9
Yoav is the builder behind Say Less, an AI-powered email summarization tool that was recently featured on the front page of Hacker News and Product Hunt. In this talk, Yoav will walk us through the end-to-end process of building the tool, from the prototype phase to deploying the model as a realtime HTTP endpoint. Yoav Zimmerman is the engineer / founder behind Model Zoo, a machine learning deployment platform focused on ease-of-use. He has previously worked at Determined AI on large-scale deep ...
Sep 08, 2020•53 min•Season 1Ep. 33
|| Links Referenced in the Show || General Info: https://medium.com/@paktek123 Load Balancer Series: https://medium.com/load-balancer-series Upcoming Open Src: https://medium.com/upcoming-open-source Some Libraries Neeran maintains: https://github.com/paktek123/elasticsearch-crystal Some libraries Neeran used to maintain: https://github.com/microsoft/pgtester (and https://medium.com/yammer-engineering/testing-postgresql-scripts-with-rspec-and-pg-tester-c3c6c1679aec) Some interesting projects Nee...
Sep 08, 2020•58 min•Season 1Ep. 8
We trained a Transformer neural net on ambient music to see if a machine can compose with the great masters. Ambient is a soft, flowing, ethereal genre of music that I’ve loved for decades. There are all kinds of ambient, from white noise, to tracks that mimic the murmur of soft summer rain in a sprawling forest, but Dan favors ambient that weaves together environmental sounds and dreamy, wavelike melodies into a single, lush tapestry. Can machine learning ever hope to craft something so ...
Sep 05, 2020•56 min•Season 1Ep. 32
MLOps and DevOps have a large number of parallels. Many of the techniques, practices, and processes used for traditional software projects can be followed almost exactly in ML projects. However, the day-to-day of an ML project is usually significantly different from a traditional software project. So while the ideas and principles can still apply, it’s important to be aware of the core aims of DevOps when applying them. Damian is a Cloud Advocate specializing in DevOps and MLOps. After spending ...
Aug 31, 2020•56 min•Season 1Ep. 7