Whatever message this page gives is out now! Go check it out!

Embeddings and vector stores

Last update:
May 18, 2026
Embeddings turn document chunks into searchable vectors. This article explains how the ingestion and retrieval pipelines work in ColdFusion, how to match embedding model dimensions to your vector store schema, and how to configure every supported store for production use.
ColdFusion typically configures both embeddings and vector stores in one place: the vectorStore struct or object includes an embeddingModel nested struct so ingest and query use the same model and compatible dimensions with the store.

How embeddings work

Embeddings are dense vectors that represent text (chunks) in a continuous space. An embedding model maps a string to a vector of fixed dimension (for example 384 for many small models).
A vector store persists those vectors (and associated metadata) so similarity search can retrieve the most relevant chunks at query time. RAG ingestion computes embeddings for each segment and writes them to the store. Retrieval embeds the user question (or a transformed query) and searches the same store.
There are two distinct pipelines that run at different times:
Ingestion pipeline (runs once, or when documents change)
  • Documents are loaded from file paths, folders, or URLs.
  • Each document is split into overlapping chunks (e.g. 1000 characters with 200-character overlap).
  • Each chunk is passed to an embedding model, which returns a vector.
  • Vectors and their source text are stored in the vector store.
  • CF returns a Future immediately — ingestion continues in the background.
Retrieval pipeline (runs on every user query)
  • The user's question is converted to a vector using the same embedding model.
  • The vector store performs a similarity search and returns the top N matching chunks.
  • Those chunks are assembled into a context block and injected into the prompt.
  • The language model generates a response grounded in the retrieved context.
  • Output guardrails (if configured) validate the response before it is returned.
The embedding store: chunks of text become vectors and get stored here for similarity search. INMEMORY means it lives in process memory (simple, but not durable across restarts). embeddingModel nested inside is the model used to turn text into vectors. It must be consistent between ingestion (indexing) and query (searching), or dimensions/providers won't line up.

Match embedding models to vector store dimensions

If the embedding dimension does not match what the index or collection expects, ingestion or search can fail or return wrong results. Tests set dimension: 384 for Milvus and Qdrant when using all-minilm-style models. Pinecone serverless config includes dimension in serverless. Always match model output size to store schema.
The following table shows common embedding models and their output dimensions:
Provider
Model name
Output dimension / notes
OpenAI
text-embedding-3-small
1536 dimensions (default). Cost-effective and high quality.
OpenAI
text-embedding-ada-002
1536 dimensions. Previous generation; still widely used.
Ollama (local)
all-minilm
384 dimensions. Set dimension: 384 in Milvus and Qdrant configs when using this model.
Note: Always use the same embedding model for both ingestion and retrieval. Switching models after ingestion requires re-indexing all documents, because the vector dimensions and semantic space will differ.
Simple RAG: passing vectorStore and embeddingModel
You can pass a fully built vectorStore (with nested embeddingModel) in the third argument to simpleRAG(). Some apps also pass a top-level embeddingModel in options for simpleRAG if there are splitting concerns; prefer one authoritative embedding definition to avoid mismatch.
<cfscript>
  ragService = simpleRAG(
    [
      application.getDocumentsDir() & "pdf",
      application.getDocumentsDir() & "txt"
    ],
    chatModel,
    { vectorStore: vectorStore }
  );
  ragService.ingest();
</cfscript>
Advanced RAG: same VectorStore for ingest and retrieval
Create one VectorStore instance. Pass it to vectorStoreIngestor and to each contentRetrievers entry, so queries search the same index you populated.
<cfscript>
  sharedVS = VectorStore({
    provider: "INMEMORY",
    embeddingModel: {
      provider: "ollama",
      modelName: "all-minilm",
      baseUrl: application.ollamaBaseUrl
    }
  });
  svc = agent({
    CHATMODEL: chatModel,
    ingestion: {
      source: expandPath("./Documents/test.txt"),
      documentSplitter: { chunkSize: 500, chunkOverlap: 100 },
      vectorStoreIngestor: { vectorStore: sharedVS }
    },
    retrievalAugmentor: {
      queryRouter: {
        contentRetrievers: [{
          vectorStore: sharedVS,
          maxResults: 5,
          minScore: 0.3,
          description: "Knowledge base"
        }]
      }
    }
  });
  svc.ingest();
  answer = svc.chat("Your question");
</cfscript>

InMemory store (development/zero-config)

INMEMORY keeps vectors in process memory. No external server. Data is typically lost on restart. Easiest for local development.
Parameter
Description
provider
Set to "INMEMORY".
embeddingModel
Nested struct defining the provider, model name, and API key for generating embeddings.
<cfscript>
  chatModel = ChatModel({
    provider: "openai",
    modelName: "gpt-4o-mini",
    apiKey: application.apiKey,
    temperature: 0.7
  });
  vectorStore = VectorStore({
    provider: "INMEMORY",
    embeddingModel: {
      provider: "openai",
      modelName: "text-embedding-3-small",
      apiKey: application.apiKey
    }
  });
  docsDir = expandPath("./docs/");
  ragService = simpleRAG(
    expandPath("./docs/"),
    chatModel,
    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }
  );
  ragService.ingest();
  answer = ragService.chat("How to update TIN?");
  writeOutput(answer.message);
</cfscript>
Output
To update your Tax Identification Number (TIN) in your Adobe account, follow these steps: Open the Edit payment method window and update your business tax identification number in the tax ID field. You need to select 'Edit' or 'Add new' and add or edit card details to access the option to add or update the tax identification number. The business tax identification number will assist in identifying the tax treatment for your purchases and orders. Once you've made your changes, select 'Save'. Note that the name of the field may vary based on the applicable tax identification number in your country, such as VAT, GST, or NIT.
Warning: The default in-memory vector store is not suitable for production. Documents must be re-indexed on every application restart. For production, configure a persistent vector store (Milvus, Qdrant, Chroma, or Pinecone).

Chroma

Chroma is an external vector database. Pass these parameters: url, databaseName, tenantName, collectionName, and embeddingModel.
Parameter
Description
provider
Set to "chroma".
url
URL of the Chroma server.
databaseName
Name of the Chroma database.
tenantName
Chroma tenant name.
collectionName
Collection to store and retrieve vectors. Use a unique name per run to avoid index collisions.
embeddingModel
Nested struct: provider, modelName, and connection details (e.g. baseUrl for Ollama).
<cfscript>
  vectorStoreClient = VectorStore({
    provider: "chroma",
    url: application.vectorDB.chroma.url,
    databaseName: application.vectorDB.chroma.databaseName,
    tenantName: application.vectorDB.chroma.tenantName,
    collectionName: "simplerag_nested_test_" & dateFormat(now(), "yyyymmdd"),
    embeddingModel: {
      provider: "ollama",
      modelName: "all-minilm",
      baseUrl: application.ollamaBaseUrl
    }
  });
  ragService = simpleRAG(
    expandPath("./Documents/nested"),
    chatModel,
    {
      vectorStore: vectorStoreClient,
      recursive: true,
      continueOnError: false,
      chunkSize: 200,
      chunkOverlap: 50
    }
  );
  ragService.ingest();
</cfscript>

Qdrant

Qdrant uses url, apiKey, collectionName, metricType, dimension, and embeddingModel.
Parameter
Description
provider
Set to "qdrant".
url
gRPC URL of the Qdrant server.
apiKey
API key for authenticating with Qdrant.
collectionName
Collection name. Use a unique name per run to avoid collisions.
metricType
Similarity metric. Typically "COSINE".
dimension
Vector dimension. Must match the embedding model output — set 384 for all-minilm.
embeddingModel
Nested struct: provider, modelName, and connection details.
<cfscript>
  vectorStore = {
    provider: "qdrant",
    url: application.vectorDB.qdrant.grpcUrl,
    apiKey: application.vectorDB.qdrant.apiKey,
    collectionName: "simplerag_nested_test_" & dateFormat(now(), "yyyymmdd"),
    metricType: "COSINE",
    dimension: 384,
    embeddingModel: {
      provider: "ollama",
      modelName: "all-minilm",
      baseUrl: application.ollamaBaseUrl
    }
  };
  ragService = simpleRAG(
    expandPath("./Documents/nested"),
    chatModel,
    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }
  );
  ragService.ingest();
</cfscript>

Milvus

Milvus uses url, databaseName, collectionName, dimension, indexType, metricType, and embeddingModel.
Parameter
Description
provider
Set to "milvus".
url
gRPC URL of the Milvus server.
databaseName
Milvus database name. Typically "default".
collectionName
Collection name. Use a unique name per run.
dimension
Vector dimension. Must match embedding model output — set 384 for all-minilm.
indexType
Index algorithm. Typically "HNSW" for approximate nearest-neighbour search.
metricType
Similarity metric. Typically "COSINE".
embeddingModel
Nested struct: provider, modelName, and connection details.
<cfscript>
  vectorStore = {
    provider: "milvus",
    url: application.vectorDB.milvus.grpcUrl,
    databaseName: "default",
    collectionName: "simplerag_nested_test_" & dateFormat(now(), "yyyymmdd"),
    dimension: 384,
    indexType: "HNSW",
    metricType: "COSINE",
    embeddingModel: {
      provider: "ollama",
      modelName: "all-minilm",
      baseUrl: application.ollamaBaseUrl
    }
  };
  ragService = simpleRAG(
    expandPath("./Documents/nested"),
    chatModel,
    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }
  );
  ragService.ingest();
</cfscript>

Pinecone

Pinecone uses apiKey, index, serverless (with dimension, cloud, region, deletionProtection), and embeddingModel. You may use vectorStore.deleteCollection() after ingest to clean up.
Parameter
Description
provider
Set to "pinecone".
apiKey
Pinecone API key.
index
Name of the Pinecone index.
serverless
Nested struct for serverless configuration: • dimension: vector dimension (must match embedding model). • cloud: cloud provider (e.g. aws). • region: deployment region. • deletionProtection: set to "disabled" to allow cleanup.
embeddingModel
Nested struct: provider, modelName, and connection details.
<cfscript>
  vectorStore = VectorStore({
    provider: "pinecone",
    apiKey: application.vectorDB.pinecone.apiKey,
    index: application.vectorDB.pinecone.index,
    serverless: {
      dimension: application.vectorDB.pinecone.serverless.dimension,
      cloud: application.vectorDB.pinecone.serverless.cloud,
      region: application.vectorDB.pinecone.serverless.region,
      deletionProtection: "disabled"
    },
    embeddingModel: {
      provider: "ollama",
      modelName: "all-minilm",
      baseUrl: application.ollamaBaseUrl
    }
  });
  ragService = simpleRAG(
    expandPath("./Documents/nested"),
    chatModel,
    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }
  );
  ragService.ingest();
  vectorStore.deleteCollection();
</cfscript>

Standalone ingest with documentService().ingest()

Load and split with documentService(), then call ingest(segments, vectorStoreClient, options) with batchSize and continueOnError. This pattern gives you full control over each pipeline stage before writing to the vector store.
Parameter
Description
segments
Array of segment structs returned by split(). Each segment has text and metadata.
vectorStoreClient
A configured VectorStore object to write embeddings into.
batchSize
How many segments to embed and write per internal batch. Higher values can improve throughput but increase memory use.
continueOnError
When true, ingestion skips or logs failed segments and continues. When false, the job stops on the first error.
<cfscript>
  docService = documentService();
  documents = docService.load({
    path: application.getDocumentsDir(),
    pattern: "*.txt"
  });
  segments = docService.split(documents, { chunkSize: 500, chunkOverlap: 50 });
  vectorStoreClient = VectorStore({
    provider: "qdrant",
    url: application.vectorDB.qdrant.grpcUrl,
    apiKey: application.vectorDB.qdrant.apiKey,
    collectionName: "dps_ingest_qdrant_test",
    metricType: "COSINE",
    dimension: 384,
    embeddingModel: {
      provider: "ollama",
      modelName: "all-minilm",
      baseUrl: application.ollamaBaseUrl
    }
  });
  result = docService.ingest(segments, vectorStoreClient, {
    batchSize: 50,
    continueOnError: true
  });
  writeOutput(result.segmentsIngested & " ingested, " & result.segmentsFailed & " failed");
</cfscript>
The result struct contains segmentsIngested (count of segments successfully written to the store) and segmentsFailed (count of segments that could not be embedded or persisted).
Note: documentService().ingest() is the standalone equivalent of the vectorStoreIngestor stage inside agent(). Use it when you want to load, split, optionally transform, and then ingest independently, without building a full agent() pipeline.

Share this page

Was this page helpful?
We're glad. Tell us how this page helped.
We're sorry. Can you tell us what didn't work for you?
Thank you for your feedback. Your response will help improve this page.

On this page