Joan Fontanals - Principal Engineer - Jina AI - podcast episode cover

Joan Fontanals - Principal Engineer - Jina AI

Jan 19, 202257 minTranscript available on Metacast
--:--
--:--
Listen in podcast apps:

Episode description

Topics:

00:00 Intro

00:42 Joan's background

01:46 What attracted Joan's attention in Jina as a company and product?

04:39 Main area of focus for Joan in the product

05:46 How Open Source model works for Jina?

08:38 Deeper dive into Jina.AI as a product and technology stack

11:57 Does Jina fit the use cases of smaller / mid-size players with smaller amount of data?

13:45 KNN/ANN algorithms available in Jina

16:05 BigANN competition and BuddyPQ, increasing 12% in recall over FAISS

17:07 Does Jina support customers in model training? Finetuner

20:46 How does Jina framework compare to Vector Databases?

26:46 Jina's investment in user-friendly APIs

31:04 Applications of Jina beyond search engines, like question answering systems

33:20 How to bring bits of neural search into traditional keyword retrieval? Connection to model interpretability

41:14 Does Jina allow going multimodal, including images / audio etc?

46:03 The magical question of Why

55:20 Product announcement from Joan

Order your Jina swag https://docs.google.com/forms/d/e/1FAIpQLSedYVfqiwvdzWPX-blCpVu-tQoiFiUJQz2QnIHU1ggy1oyg/ Use this promo code: vectorPodcastxJinaAI

Show notes:

- Jina.AI: https://jina.ai/

- HNSW + PostgreSQL Indexer: [GitHub - jina-ai/executor-hnsw-postgres: A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL](https://github.com/jina-ai/executor-h...)

- pqlite: [GitHub - jina-ai/pqlite: A fast embedded library for Approximate Nearest Neighbor Search integrated with the Jina ecosystem](https://github.com/jina-ai/pqlite)

- BuddyPQ: [Billion-Scale Vector Search: Team Sisu and BuddyPQ | by Dmitry Kan | Big-ANN-Benchmarks | Nov, 2021 | Medium](https://medium.com/big-ann-benchmarks...)

- PaddlePaddle: [GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)](https://github.com/PaddlePaddle/Paddle)

- Jina Finetuner: [Finetuner 0.3.1 documentation](https://finetuner.jina.ai/)

- [Not All Vector Databases Are Made Equal | by Dmitry Kan | Towards Data Science](https://towardsdatascience.com/milvus...)

- Fluent interface (method chaining): [Fluent interfaces in Python | Florian Einfalt – Developer](https://florianeinfalt.de/posts/fluen...)

- Sujit Pal’s blog: [Salmon Run](http://sujitpal.blogspot.com/)

- ByT5: Towards a token-free future with pre-trained byte-to-byte models https://arxiv.org/abs/2105.13626

Special thanks to Saurabh Rai for the Podcast Thumbnail: https://twitter.com/srbhr_ https://www.linkedin.com/in/srbh077/