AI and Machine Learning

Vector Databases Under the Hood: ChromaDB, Pinecone, Qdrant

Dive deep into how vector databases work, exploring ANN algorithms, architecture, and core features. Compare ChromaDB, Pinecone, and Qdrant for your AI projects.

Khader Vali calendar_today May 26, 2026 schedule 17 min read

Vector Databases Under the Hood: ChromaDB, Pinecone, and Qdrant Explained

The rise of Large Language Models (LLMs) and generative AI has fundamentally shifted how we interact with data. Suddenly, the nuanced meaning and semantic relationships within information have become paramount. Traditional relational or NoSQL databases, designed for exact matches and structured queries, struggle in this new paradigm. This is where vector databases step in, offering a specialized solution to store, index, and query high-dimensional vectors, enabling powerful semantic search, recommendation systems, and AI-driven applications.

As a senior software engineer navigating this exciting landscape, you’ve likely encountered the need for a robust vector store. But with an increasing number of options, how do you choose? More importantly, what’s actually happening beneath the surface when you perform a semantic search? In this comprehensive article, we’ll peel back the layers, demystifying the core mechanics of vector databases and then conducting a deep dive into three prominent players: ChromaDB, Pinecone, and Qdrant. Our goal is to equip you with the knowledge to make informed decisions and architect intelligent applications.

Vector Databases Under the Hood: ChromaDB, Pinecone, Qdrant
Photo via Pexels

Understanding Vector Embeddings: The Foundation

Before we delve into databases, we must grasp their fundamental building blocks: vector embeddings. An embedding is a numerical representation of an object (text, image, audio, video, etc.) in a high-dimensional space. The magic lies in how these numbers are generated: objects with similar meanings or characteristics are mapped to points that are close to each other in this vector space, while dissimilar objects are far apart.

This transformation is typically performed by sophisticated machine learning models (e.g., Transformer models like BERT, OpenAI’s Ada, or various image encoders). These models learn to capture the semantic essence of the input and project it into a fixed-size array of floating-point numbers. For instance, the phrase “king” and “queen” would have embeddings that are numerically closer than “king” and “bicycle.”

Example: Generating Text Embeddings with OpenAI

Let’s illustrate with a simple Python example using OpenAI’s embedding API:


import openai

# Make sure to set your OpenAI API key
# openai.api_key = "YOUR_OPENAI_API_KEY" 

def get_embedding(text, model="text-embedding-ada-002"):
    text = text.replace("\n", " ")
    response = openai.embeddings.create(input=[text], model=model)
    return response.data[0].embedding

# Generate embeddings for a few sentences
text1 = "The quick brown fox jumps over the lazy dog."
text2 = "A fast animal with brown fur leaps over a sleepy canine."
text3 = "The car sped down the highway."

embedding1 = get_embedding(text1)
embedding2 = get_embedding(text2)
embedding3 = get_embedding(text3)

print(f"Embedding 1 (length {len(embedding1)}): {embedding1[:5]}...") # print first 5 elements
print(f"Embedding 2 (length {len(embedding2)}): {embedding2[:5]}...")
print(f"Embedding 3 (length {len(embedding3)}): {embedding3[:5]}...")

# You would then store these embeddings in a vector database.

The output of `get_embedding` is a list of floats, often with hundreds or thousands of dimensions (e.g., Ada-002 produces 1536 dimensions). These high-dimensional vectors are what vector databases are designed to manage.

How Vector Databases Work Under the Hood

At its core, a vector database excels at one thing: finding the “nearest neighbors” to a given query vector in a vast collection of other vectors. This “similarity search” is computationally intensive, especially with high dimensions and millions or billions of vectors. Exact nearest neighbor search (brute-force) would involve calculating the distance between the query vector and every other vector in the database, which is infeasible for real-time applications at scale.

This is where Approximate Nearest Neighbor (ANN) algorithms come into play. ANN algorithms sacrifice a tiny bit of accuracy for massive gains in speed, making similarity search practical. Let’s explore the key components and mechanisms.

1. Similarity Metrics: Measuring Closeness

Before diving into indexing, we need to understand how “closeness” is quantified. Different similarity metrics are suitable for different types of data and embedding models:

  • Cosine Similarity: Measures the cosine of the angle between two vectors. It focuses on the orientation, not the magnitude, making it excellent for text embeddings where direction often signifies semantic meaning. A value of 1 means identical direction, 0 means orthogonal, and -1 means opposite.
  • Euclidean Distance (L2 Distance): The straight-line distance between two points in Euclidean space. Shorter distance means higher similarity. Often used for image embeddings or general-purpose metrics.
  • Dot Product: The sum of the products of corresponding components. Larger dot products generally indicate higher similarity. Some embedding models are specifically trained to optimize dot product for similarity.

Vector databases allow you to specify the metric, as it directly impacts how vectors are compared and indexed.

2. Indexing Algorithms (Approximate Nearest Neighbor – ANN)

ANN algorithms are the secret sauce of vector databases. They build data structures that allow for rapid searches, avoiding the need to compare every vector. Here are some prominent ones:

a. HNSW (Hierarchical Navigable Small Worlds)

HNSW is one of the most popular and efficient ANN algorithms, known for its excellent balance of search speed and recall (how many actual nearest neighbors are found). It builds a multi-layer graph structure:

  • Layers: Imagine several layers, like an onion. The top layers contain fewer nodes (vectors) but with longer connections, allowing for quick traversal across large distances. Lower layers have more nodes and shorter connections, enabling fine-grained search.
  • Nodes (Vectors): Each vector in the database is a node in the graph.
  • Edges (Connections): Each node is connected to a small number of its approximate nearest neighbors.

How HNSW Search Works (Simplified):
1. Start at the Top Layer: Begin at a random or predefined entry point in the topmost layer.
2. Greedy Search: From the current node, examine its neighbors. Move to the neighbor that is closest to the query vector.
3. Traverse Layers: Repeat this greedy search within the current layer until no closer neighbor is found. Then, “drop down” to the corresponding closest node in the layer below.
4. Refine Search: Continue this process, refining the search in progressively denser layers, until you reach the bottom layer.
5. Candidate List: Maintain a dynamic candidate list of the ‘k’ closest vectors found so far. As you traverse, update this list.

Architectural View (in words):


Layer L (Sparse, Long connections)
  Node A --- Node B --- Node C
   |          |          |
Layer L-1 (Denser, Medium connections)
  Node A'--Node B'--Node C'--Node D'
   |          |          |
Layer 0 (Densest, Short connections, all vectors)
  Node A''-Node B''-Node C''-Node D''-Node E''-Node F''

Each node in a higher layer has a corresponding node in the layer below it (and often multiple outgoing edges to nodes in its own layer and incoming/outgoing to the layer below/above). The search effectively navigates from coarse to fine granularity.

b. IVF (Inverted File Index)

IVF is another common ANN algorithm, particularly effective for very large datasets where memory efficiency is crucial. It works by quantizing the vector space:

  • Clustering (Voronoi Cells): The entire dataset of vectors is first clustered into ‘n’ groups using algorithms like K-Means. Each cluster has a centroid (representative vector).
  • Inverted File List: For each centroid, an “inverted list” is maintained, containing all the vectors that belong to its cluster.

How IVF Search Works (Simplified):
1. Quantize Query: Given a query vector, first find the ‘m’ closest centroids to it.
2. Probe Inverted Lists: Retrieve the inverted lists associated with these ‘m’ centroids.
3. Refine Search: Perform a brute-force (exact) nearest neighbor search only within the vectors found in these ‘m’ lists, significantly reducing the search space compared to the entire database.

Architectural View (in words):


Centroid 1 --> [ Vector 1.1, Vector 1.2, Vector 1.3, ... ]
Centroid 2 --> [ Vector 2.1, Vector 2.2, Vector 2.3, ... ]
Centroid 3 --> [ Vector 3.1, Vector 3.2, Vector 3.3, ... ]
...
Centroid N --> [ Vector N.1, Vector N.2, Vector N.3, ... ]

This approach offers a good balance, but recall can be sensitive to the number of centroids and how many lists are “probed.”

c. Other Algorithms

  • LSH (Locality Sensitive Hashing): Projects high-dimensional vectors into lower-dimensional buckets using hash functions. Vectors in the same bucket are likely similar. While simple, it often provides lower recall than HNSW or IVF.
  • DiskANN, SCANN, FAISS (Facebook AI Similarity Search): FAISS is a library that implements many of these algorithms and often serves as a backend for vector databases. DiskANN and SCANN are optimized for specific scenarios (e.g., DiskANN for disk-bound datasets, SCANN for quantization-based search).

The choice of ANN algorithm involves trade-offs between:

  • Recall: How many of the true nearest neighbors are found (higher is better).
  • Speed: Query latency (lower is better).
  • Memory Usage: How much RAM the index consumes (lower is better).
  • Index Build Time: How long it takes to create or update the index.

3. Data Storage and Management

A vector database isn’t just an ANN index; it’s a complete data management system. It needs to store:

  • Vectors: The high-dimensional numerical arrays.
  • Payload/Metadata: Associated information with each vector (e.g., text content, image URL, user ID, timestamps, categories). This metadata is crucial for filtering queries.
  • IDs: Unique identifiers for each vector, allowing for retrieval and updates.

These components need to be stored efficiently, often with a combination of in-memory caches, persistent storage (SSD/HDD), and distributed file systems for scalability and durability.

4. Querying Process with Metadata Filtering

A typical vector database query involves more than just finding similar vectors:

  1. Query Vector Generation: The user’s input (e.g., natural language query) is converted into an embedding using the same model used for the indexed data.
  2. Metadata Filtering (Pre-filtering): If the query includes metadata filters (e.g., “find documents related to ‘AI’ but only from the year 2023”), the database first filters the entire set of vectors based on this metadata. This narrows down the pool of candidates for the ANN search.
  3. ANN Search: The query vector is then passed to the ANN index, which efficiently identifies the approximate nearest neighbors from the *filtered* set of vectors.
  4. Re-ranking (Optional): Sometimes, a small number of top candidates from the ANN search might undergo a more precise (brute-force) distance calculation or be re-ranked based on additional criteria.
  5. Result Retrieval: The database retrieves the full payload/metadata for the top ‘k’ nearest neighbors and returns them to the user.

Metadata filtering is a critical feature, as it allows for hybrid search capabilities, combining the power of semantic similarity with the precision of structured queries.

Vector Databases Under the Hood: ChromaDB, Pinecone, Qdrant
Photo via Pexels

Deep Dive: ChromaDB

ChromaDB positions itself as the AI-native open-source embedding database. It’s designed to be lightweight and easy to get started with, often used for local development, smaller-scale applications, or as a component within larger systems.

Architecture Description

ChromaDB offers flexible deployment options:

  • In-Memory: The simplest mode, where everything runs in the same process as your application, using RAM for storage. Great for rapid prototyping and small datasets.
  • Persistent (On-Disk): Stores data on the local filesystem, allowing persistence across restarts. Still runs as a local client-server model within the same machine or can be accessed via a local HTTP server.
  • Client-Server (HTTP API): You can run a ChromaDB server as a separate process, accessible via an HTTP API. This allows multiple clients to connect to a single Chroma instance.

Under the hood, ChromaDB leverages efficient data structures and ANN algorithms. While its specific implementation details can evolve, it commonly utilizes libraries like Hnswlib for its core indexing, providing a solid foundation for approximate nearest neighbor search. It manages vector storage, metadata, and the index together, providing a unified interface.

Architectural View (in words for Persistent/Client-Server):


[Your Application] <-- Python Client/HTTP API --> [ChromaDB Server]
                                                    |
                                                    V
                                          +---------------------+
                                          | Chroma Core Engine  |
                                          | (Hnswlib for ANN)   |
                                          +---------------------+
                                                    |
                                                    V
                                          +---------------------+
                                          | Persistent Storage  |
                                          | (Vectors, Metadata, |
                                          | Index on disk/SSD)  |
                                          +---------------------+

Key Features

  • Easy to Use: Simple Python API, low barrier to entry.
  • Open Source: Full control and community support.
  • Metadata Filtering: Supports filtering results based on associated metadata.
  • Pluggable Embeddings: Not tied to a specific embedding model; you can bring your own.
  • Snapshotting and Backups: Basic mechanisms for data durability.
  • Hybrid Search: Combines vector similarity with metadata filtering.

Use Cases

  • Local LLM Applications: RAG (Retrieval Augmented Generation) for chatbots or question-answering systems running locally.
  • Prototyping and Development: Quickly test ideas involving semantic search without complex setup.
  • Personal Knowledge Bases: Indexing notes, articles, or documents for semantic retrieval.
  • Small to Medium-Scale Applications: When a fully managed cloud service might be overkill or too expensive.

Code Example with ChromaDB

Let’s demonstrate creating a collection, adding embeddings, and querying with ChromaDB:


import chromadb
from sentence_transformers import SentenceTransformer

# 1. Initialize ChromaDB client (persistent client)
# This will create a 'chroma_data' directory for storage
client = chromadb.PersistentClient(path="./chroma_data")

# 2. Get an embedding function (using a local Sentence Transformer model for simplicity)
# In a real app, you might use OpenAI, Cohere, etc.
model = SentenceTransformer('all-MiniLM-L6-v2')
def custom_embedding_function(texts):
    return model.encode(texts).tolist()

# 3. Create or get a collection
# A collection is where your embeddings and metadata live
collection_name = "my_documents_collection"
try:
    collection = client.get_collection(name=collection_name)
    print(f"Collection '{collection_name}' already exists.")
except:
    collection = client.create_collection(
        name=collection_name,
        embedding_function=custom_embedding_function # Pass the embedding function
    )
    print(f"Collection '{collection_name}' created.")

# 4. Add documents (text, embeddings, metadata, and unique IDs)
documents = [
    "Khadervali is a software engineer with expertise in AI and cloud.",
    "The latest advancements in large language models are fascinating.",
    "Cloud computing provides scalable infrastructure for modern applications.",
    "Python is a versatile language popular for AI and web development.",
    "Exploring the nuances of vector databases for semantic search."
]
metadatas = [
    {"source": "blog", "author": "Khadervali", "year": 2023},
    {"source": "article", "topic": "LLMs", "year": 2024},
    {"source": "documentation", "topic": "Cloud", "year": 2023},
    {"source": "guide", "topic": "Programming", "year": 2022},
    {"source": "research", "topic": "VectorDB", "year": 2024}
]
ids = [f"doc{i}" for i in range(len(documents))]

# ChromaDB will automatically embed the documents using the provided embedding_function
# if you pass the 'documents' directly. If you have pre-computed embeddings,
# you can pass them via the 'embeddings' parameter.
collection.add(
    documents=documents,
    metadatas=metadatas,
    ids=ids
)
print(f"Added {len(documents)} documents to the collection.")

# 5. Query the collection
query_text = "What is Khadervali known for?"
results = collection.query(
    query_texts=[query_text],
    n_results=2, # Number of results to return
    where={"topic": {"$ne": "Cloud"}} # Metadata filter: exclude documents about 'Cloud'
)

print("\nQuery Results:")
for i, doc in enumerate(results['documents'][0]):
    print(f"  Result {i+1}:")
    print(f"    Document: {doc}")
    print(f"    Metadata: {results['metadatas'][0][i]}")
    print(f"    Distance: {results['distances'][0][i]:.4f}")

# Example of querying without metadata filter
query_text_all = "scalable infrastructure"
results_all = collection.query(
    query_texts=[query_text_all],
    n_results=1
)
print("\nQuery Results (no filter):")
print(f"  Document: {results_all['documents'][0][0]}")
print(f"  Metadata: {results_all['metadatas'][0][0]}")
print(f"  Distance: {results_all['distances'][0][0]:.4f}")

# You can also delete documents
# collection.delete(ids=["doc0"])
# print("Deleted doc0.")

Deep Dive: Pinecone

Pinecone is a fully managed, cloud-native vector database designed for production-scale AI applications. It abstracts away the complexities of infrastructure, scaling, and index management, allowing developers to focus purely on building their AI features.

Architecture Description

Pinecone operates as a distributed system in the cloud, offering high availability, scalability, and performance. Its architecture is proprietary but based on well-established distributed systems principles and optimized ANN algorithms. Key components typically include:

  • Control Plane: Manages clusters, indexes, and API access. Handles metadata, configurations, and scaling decisions.
  • Data Plane (Index Pods): The core where vectors are stored and indexed. These are distributed across multiple nodes (pods), each responsible for a shard of the overall index. Pinecone often uses optimized versions of HNSW or other graph-based ANN algorithms tuned for performance and memory efficiency.
  • Storage Layer: Persistent storage for vectors and their metadata, often leveraging cloud-native storage solutions for durability and consistency.
  • Query Engine: Distributes incoming queries across relevant index pods, aggregates results, and performs any necessary re-ranking or filtering.

Being fully managed means Pinecone handles all aspects of scaling (up and down), load balancing, and fault tolerance automatically. Users interact with it purely through an API.

Architectural View (in words):


[Your Application] <-- Pinecone Python Client / REST API --> [Pinecone Cloud Service]
                                                                  |
                                                                  V
                                                        +---------------------+
                                                        |   Control Plane     |
                                                        | (Index Mgmt, Auth)  |
                                                        +---------------------+
                                                                  |
                                                                  V
                                                        +---------------------+
                                                        |   Data Plane        |
                                                        | (Distributed Index  |
                                                        |    Pods / Shards)   |
                                                        |  - ANN Algorithms   |
                                                        |  - Vector Storage   |
                                                        |  - Metadata Store   |
                                                        +---------------------+
                                                                  |
                                                                  V
                                                        [Cloud Storage (S3/GCS)]

Key Features

  • Fully Managed: No infrastructure to provision or manage.
  • Scalability: Designed to handle billions of vectors and millions of queries per second. Automatically scales based on demand.
  • Real-time Updates: Supports efficient upserts (updates/inserts) and deletes, making it suitable for dynamic datasets.
  • Metadata Filtering: Robust filtering capabilities for precise search.
  • Hybrid Search: Seamlessly combines semantic and keyword/metadata search.
  • Multi-tenant: Securely manage multiple indexes and projects.
  • Developer-Friendly API: Simple Python client and REST API.

Use Cases

  • Large-Scale Semantic Search: Powering search engines for e-commerce, documentation, or content platforms.
  • Recommendation Systems: Personalizing product recommendations, content suggestions, or user matching.
  • Generative AI & RAG: Providing contextual information to LLMs at scale.
  • Anomaly Detection: Identifying unusual patterns in high-dimensional data streams.
  • Real-time AI Applications: Any application requiring low-latency similarity search on dynamic datasets.

Code Example with Pinecone

Using Pinecone for vector storage and querying:


from pinecone import Pinecone, Index, PodSpec
from sentence_transformers import SentenceTransformer
import time

# 1. Initialize Pinecone client
# Replace with your actual API key and environment
# pc = Pinecone(api_key="YOUR_PINECONE_API_KEY", environment="YOUR_PINECONE_ENVIRONMENT")
# For demonstration, we'll mock it or assume a local setup if available.
# In a real scenario, you'd uncomment the above line.
print("Pinecone client initialized (assuming API key and environment are set).")

# Mock Pinecone client for local execution if not configured for real API
class MockPineconeClient:
def __init__(self):
self.indexes = {}

def create_index(self, name, dimension, metric, spec):
print(f"Mock: Creating index '{name}' with dim {dimension}, metric {metric}")
self.indexes[name] = MockPineconeIndex(name, dimension, metric)

def Index(self, name):
if name not in self.indexes:
raise ValueError(f"Mock: Index '{name}' does not exist.")
return self.indexes[name]

def list_indexes(self):
return [{"name": name} for name in self.indexes.keys()]

def delete_index(self, name):
if name in self.indexes:
del self.indexes[name]
print(f"Mock: Deleted index '{name}'")

class MockPineconeIndex:
def __init__(self, name, dimension, metric):
self.name = name
self.dimension = dimension
self.metric = metric
self.data = {} # {id: {"values": [...], "metadata": {...}}}

def upsert(self, vectors, namespace=None):
print(f"Mock: Upserting {len(vectors)} vectors to index '{self.name}'")
for vec_id, vec_values, vec_metadata in vectors:
self.data[vec_id] = {"values": vec_values, "metadata": vec_metadata}

def query(self, vector=None, id=None, top_k=5, include_values=False, include_metadata=False, filter=None, namespace=None):
print(f"Mock: Querying index '{self.name}' with top_k={top_k}")
results = []
query_vec = vector

# Simple mock search (brute-force for demonstration)
# In a real Pinecone index, this would use ANN
for _id, item in self.data.items():
match_filter = True
if filter:
for key, val in filter.items():
if key not in item["metadata"] or item["metadata"][key] != val: # Simple equality filter
match_filter = False
break

if match_filter:
# Calculate simple Euclidean distance for mock
dist = sum([(a - b)**2 for a, b in zip(query_vec, item["values"])])**0.5
results.append({
"id": _id,
"score": 1 / (1 + dist), # Inverse distance as score
"values": item["values"] if include_values else [],
"metadata": item["metadata"] if include_metadata else {}
})

results.sort(key=lambda x: x["score"], reverse=True)
return {"matches": results[:top_k]}

pc = MockPineconeClient() # Use mock client for this example

# 2. Get an embedding function (using a local Sentence Transformer model for simplicity)
model = SentenceTransformer('all-MiniLM-L6-v2')
def get_sentence_embedding(text):
return model.encode(text).tolist()

# 3. Define index parameters
index_name = "khadervali-blog-index"
dimension = 384 # 'all-MiniLM-L6-v2' outputs 384-dim vectors
metric = "cosine" # Or 'euclidean', 'dotproduct'

# 4. Create index if it doesn't exist
if index_name not in [idx["name"] for idx in pc.list_indexes()]:
pc.create_index(
name=index_name,
dimension=dimension,
metric=metric,
spec=PodSpec(environment="gcp-starter") # 'starter' environment for free tier
)
print(f"Created Pinecone index '{index_name}'. Waiting for it to be ready...")
time.sleep(10) # Wait for index to initialize (in real Pinecone)
else:
print(f"Pinecone index '{index_name}' already exists.")

index = pc.Index(index_name)

# 5. Prepare data for upsert
data = [
("doc1

Photo by Google DeepMind on Pexels

Written by

Khader Vali

Senior Software Engineer specializing in cloud architecture, real-time systems, and enterprise-scale applications.

Share this article

Related Articles

Implementing LLM Integration Patterns in Production Systems

Sep 02, 2024 · 1 min read

Angular Performance Optimization at Enterprise Scale

Sep 20, 2024 · 2 min read

Evaluating LLM Outputs: Metrics, Benchmarks & Human Loops

May 27, 2026 · 16 min read