177: Vector Databases
Nov. 4, 2024, 4 p.m. (1 month, 3 weeks ago)
0 Comments
Intro topic: Buying a Car
News/Links:
- Cognitive Load is what Matters
- Diffusion models are Real-Time Game Engines
- Your Company Needs Junior Devs
- Seamless Streaming / Fish Speech / LLaMA Omni
Book of the Show
- Patrick:
- Thought Emporium Youtube
- Jason:
- Novel Minds
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Escape Simulator
- Jason:
- Cursor IDE
Topic: Vector Databases (~54 min)
- How computers represent data traditionally
- ASCII values
- RGB values
- How traditional compression works
- Huffman encoding (tree structure)
- Lossy example: Fourier Transform & store coefficients
- How embeddings are computed
- Pairwise (contrastive) methods
- Forward models (self-supervised)
- Similarity metrics
- Approximate Nearest Neighbors (ANN)
- Sub-Linear ANN
- Clustering
- Space Partitioning (e.g. K-D Trees)
- What a vector database does
- Perform nearest-neighbors with many different similarity metrics
- Store the vectors and the data structures to support sub-linear ANN
- Handle updates, deletes, rebalancing/reclustering, backups/restores
- Examples
- pgvector: a vector-database plugin for postgres
- Weaviate, Pinecone
- Milvus
No comments have been posted yet, be the first one to comment.