Milvus - Vector Store
Use Milvus as a vector store for RAG.
Quick Startโ
You need three things:
- A Milvus instance (cloud or self-hosted)
 - An embedding model (to convert your queries to vectors)
 - A Milvus collection with vector fields
 
Usageโ
- SDK
 - PROXY
 
Basic Searchโ
from litellm import vector_stores
import os
# Set your credentials
os.environ["MILVUS_API_KEY"] = "your-milvus-api-key"
os.environ["MILVUS_API_BASE"] = "https://your-milvus-instance.milvus.io"
# Search the vector store
response = vector_stores.search(
    vector_store_id="my-collection-name",  # Your Milvus collection name
    query="What is the capital of France?",
    custom_llm_provider="milvus",
    litellm_embedding_model="azure/text-embedding-3-large",
    litellm_embedding_config={
        "api_base": "your-embedding-endpoint",
        "api_key": "your-embedding-api-key",
        "api_version": "2025-09-01"
    },
    milvus_text_field="book_intro",  # Field name that contains text content
    api_key=os.getenv("MILVUS_API_KEY"),
)
print(response)
Async Searchโ
from litellm import vector_stores
response = await vector_stores.asearch(
    vector_store_id="my-collection-name",
    query="What is the capital of France?",
    custom_llm_provider="milvus",
    litellm_embedding_model="azure/text-embedding-3-large",
    litellm_embedding_config={
        "api_base": "your-embedding-endpoint",
        "api_key": "your-embedding-api-key",
        "api_version": "2025-09-01"
    },
    milvus_text_field="book_intro",
    api_key=os.getenv("MILVUS_API_KEY"),
)
print(response)
Advanced Optionsโ
from litellm import vector_stores
response = vector_stores.search(
    vector_store_id="my-collection-name",
    query="What is the capital of France?",
    custom_llm_provider="milvus",
    litellm_embedding_model="azure/text-embedding-3-large",
    litellm_embedding_config={
        "api_base": "your-embedding-endpoint",
        "api_key": "your-embedding-api-key",
    },
    milvus_text_field="book_intro",
    api_key=os.getenv("MILVUS_API_KEY"),
    # Milvus-specific parameters
    limit=10,  # Number of results to return
    offset=0,  # Pagination offset
    dbName="default",  # Database name
    annsField="book_intro_vector",  # Vector field name
    outputFields=["id", "book_intro", "title"],  # Fields to return
    filter='book_id > 0',  # Metadata filter expression
    searchParams={"metric_type": "L2", "params": {"nprobe": 10}},  # Search parameters
)
print(response)
Setup Configโ
Add this to your config.yaml:
vector_store_registry:
  - vector_store_name: "milvus-knowledgebase"
    litellm_params:
        vector_store_id: "my-collection-name"
        custom_llm_provider: "milvus"
        api_key: os.environ/MILVUS_API_KEY
        api_base: https://your-milvus-instance.milvus.io
        litellm_embedding_model: "azure/text-embedding-3-large"
        litellm_embedding_config:
            api_base: https://your-endpoint.cognitiveservices.azure.com/
            api_key: os.environ/AZURE_API_KEY
            api_version: "2025-09-01"
        milvus_text_field: "book_intro"
        # Optional Milvus parameters
        annsField: "book_intro_vector"
        limit: 10
Start Proxyโ
litellm --config /path/to/config.yaml
Search via APIโ
curl -X POST 'http://0.0.0.0:4000/v1/vector_stores/my-collection-name/search' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
  "query": "What is the capital of France?"
}'
Required Parametersโ
| Parameter | Type | Description | 
|---|---|---|
vector_store_id | string | Your Milvus collection name | 
custom_llm_provider | string | Set to "milvus" | 
litellm_embedding_model | string | Model to generate query embeddings (e.g., "azure/text-embedding-3-large") | 
litellm_embedding_config | dict | Config for the embedding model (api_base, api_key, api_version) | 
milvus_text_field | string | Field name in your collection that contains text content | 
api_key | string | Your Milvus API key (or set MILVUS_API_KEY env var) | 
api_base | string | Your Milvus API base URL (or set MILVUS_API_BASE env var) | 
Optional Parametersโ
| Parameter | Type | Description | 
|---|---|---|
dbName | string | Database name (default: "default") | 
annsField | string | Vector field name to search (default: "book_intro_vector") | 
limit | integer | Maximum number of results to return | 
offset | integer | Pagination offset | 
filter | string | Filter expression for metadata filtering | 
groupingField | string | Field to group results by | 
outputFields | list | List of fields to return in results | 
searchParams | dict | Search parameters like metric type and search parameters | 
partitionNames | list | List of partition names to search | 
consistencyLevel | string | Consistency level for the search | 
Supported Featuresโ
| Feature | Status | Notes | 
|---|---|---|
| Logging | โ Supported | Full logging support available | 
| Guardrails | โ Not Yet Supported | Guardrails are not currently supported for vector stores | 
| Cost Tracking | โ Supported | Cost is $0 for Milvus searches | 
| Unified API | โ Supported | Call via OpenAI compatible /v1/vector_stores/search endpoint | 
| Passthrough | โ Not yet supported | 
Response Formatโ
The response follows the standard LiteLLM vector store format:
{
  "object": "vector_store.search_results.page",
  "search_query": "What is the capital of France?",
  "data": [
    {
      "score": 0.95,
      "content": [
        {
          "text": "Paris is the capital of France...",
          "type": "text"
        }
      ],
      "file_id": null,
      "filename": null,
      "attributes": {
        "id": "123",
        "title": "France Geography"
      }
    }
  ]
}
How It Worksโ
When you search:
- LiteLLM converts your query to a vector using the embedding model you specified
 - It sends the vector to your Milvus instance via the 
/v2/vectordb/entities/searchendpoint - Milvus finds the most similar documents in your collection using vector similarity search
 - Results come back with distance scores
 
The embedding model can be any model supported by LiteLLM - Azure OpenAI, OpenAI, Bedrock, etc.