Embeddings and vector stores

Last update:

May 18, 2026

Embeddings turn document chunks into searchable vectors. This article explains how the ingestion and retrieval pipelines work in ColdFusion, how to match embedding model dimensions to your vector store schema, and how to configure every supported store for production use.

ColdFusion typically configures both embeddings and vector stores in one place: the vectorStore struct or object includes an embeddingModel nested struct so ingest and query use the same model and compatible dimensions with the store.

How embeddings work

Embeddings are dense vectors that represent text (chunks) in a continuous space. An embedding model maps a string to a vector of fixed dimension (for example 384 for many small models).

A vector store persists those vectors (and associated metadata) so similarity search can retrieve the most relevant chunks at query time. RAG ingestion computes embeddings for each segment and writes them to the store. Retrieval embeds the user question (or a transformed query) and searches the same store.

There are two distinct pipelines that run at different times:

Ingestion pipeline (runs once, or when documents change)

Documents are loaded from file paths, folders, or URLs.
Each document is split into overlapping chunks (e.g. 1000 characters with 200-character overlap).
Each chunk is passed to an embedding model, which returns a vector.
Vectors and their source text are stored in the vector store.
CF returns a Future immediately — ingestion continues in the background.

Retrieval pipeline (runs on every user query)

The user's question is converted to a vector using the same embedding model.
The vector store performs a similarity search and returns the top N matching chunks.
Those chunks are assembled into a context block and injected into the prompt.
The language model generates a response grounded in the retrieved context.
Output guardrails (if configured) validate the response before it is returned.

The embedding store: chunks of text become vectors and get stored here for similarity search. INMEMORY means it lives in process memory (simple, but not durable across restarts). embeddingModel nested inside is the model used to turn text into vectors. It must be consistent between ingestion (indexing) and query (searching), or dimensions/providers won't line up.

Match embedding models to vector store dimensions

If the embedding dimension does not match what the index or collection expects, ingestion or search can fail or return wrong results. Tests set dimension: 384 for Milvus and Qdrant when using all-minilm-style models. Pinecone serverless config includes dimension in serverless. Always match model output size to store schema.

The following table shows common embedding models and their output dimensions:

Provider	Model name	Output dimension / notes
OpenAI	text-embedding-3-small	1536 dimensions (default). Cost-effective and high quality.
OpenAI	text-embedding-ada-002	1536 dimensions. Previous generation; still widely used.
Ollama (local)	all-minilm	384 dimensions. Set dimension: 384 in Milvus and Qdrant configs when using this model.

Note: Always use the same embedding model for both ingestion and retrieval. Switching models after ingestion requires re-indexing all documents, because the vector dimensions and semantic space will differ.

Simple RAG: passing vectorStore and embeddingModel

You can pass a fully built vectorStore (with nested embeddingModel) in the third argument to simpleRAG(). Some apps also pass a top-level embeddingModel in options for simpleRAG if there are splitting concerns; prefer one authoritative embedding definition to avoid mismatch.

<cfscript>

  ragService = simpleRAG(

    [

      application.getDocumentsDir() & "pdf",

      application.getDocumentsDir() & "txt"

    ],

    chatModel,

    { vectorStore: vectorStore }

  );

  ragService.ingest();

</cfscript>

Advanced RAG: same VectorStore for ingest and retrieval

Create one VectorStore instance. Pass it to vectorStoreIngestor and to each contentRetrievers entry, so queries search the same index you populated.

<cfscript>

  sharedVS = VectorStore({

    provider: "INMEMORY",

    embeddingModel: {

      provider: "ollama",

      modelName: "all-minilm",

      baseUrl: application.ollamaBaseUrl

    }

  });

  svc = agent({

    CHATMODEL: chatModel,

    ingestion: {

      source: expandPath("./Documents/test.txt"),

      documentSplitter: { chunkSize: 500, chunkOverlap: 100 },

      vectorStoreIngestor: { vectorStore: sharedVS }

    },

    retrievalAugmentor: {

      queryRouter: {

        contentRetrievers: [{

          vectorStore: sharedVS,

          maxResults: 5,

          minScore: 0.3,

          description: "Knowledge base"

        }]

      }

    }

  });

  svc.ingest();

  answer = svc.chat("Your question");

</cfscript>

InMemory store (development/zero-config)

INMEMORY keeps vectors in process memory. No external server. Data is typically lost on restart. Easiest for local development.

Parameter	Description
provider	Set to "INMEMORY".
embeddingModel	Nested struct defining the provider, model name, and API key for generating embeddings.

<cfscript>

  chatModel = ChatModel({

    provider: "openai",

    modelName: "gpt-4o-mini",

    apiKey: application.apiKey,

    temperature: 0.7

  });

  vectorStore = VectorStore({

    provider: "INMEMORY",

    embeddingModel: {

      provider: "openai",

      modelName: "text-embedding-3-small",

      apiKey: application.apiKey

    }

  });

  docsDir = expandPath("./docs/");

  ragService = simpleRAG(

    expandPath("./docs/"),

    chatModel,

    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }

  );

  ragService.ingest();

  answer = ragService.chat("How to update TIN?");

  writeOutput(answer.message);

</cfscript>

Output

To update your Tax Identification Number (TIN) in your Adobe account, follow these steps: Open the Edit payment method window and update your business tax identification number in the tax ID field. You need to select 'Edit' or 'Add new' and add or edit card details to access the option to add or update the tax identification number. The business tax identification number will assist in identifying the tax treatment for your purchases and orders. Once you've made your changes, select 'Save'. Note that the name of the field may vary based on the applicable tax identification number in your country, such as VAT, GST, or NIT.

Warning: The default in-memory vector store is not suitable for production. Documents must be re-indexed on every application restart. For production, configure a persistent vector store (Milvus, Qdrant, Chroma, or Pinecone).

Chroma

Chroma is an external vector database. Pass these parameters: url, databaseName, tenantName, collectionName, and embeddingModel.

Parameter	Description
provider	Set to "chroma".
url	URL of the Chroma server.
databaseName	Name of the Chroma database.
tenantName	Chroma tenant name.
collectionName	Collection to store and retrieve vectors. Use a unique name per run to avoid index collisions.
embeddingModel	Nested struct: provider, modelName, and connection details (e.g. baseUrl for Ollama).

<cfscript>

  vectorStoreClient = VectorStore({

    provider: "chroma",

    url: application.vectorDB.chroma.url,

    databaseName: application.vectorDB.chroma.databaseName,

    tenantName: application.vectorDB.chroma.tenantName,

    collectionName: "simplerag_nested_test_" & dateFormat(now(), "yyyymmdd"),

    embeddingModel: {

      provider: "ollama",

      modelName: "all-minilm",

      baseUrl: application.ollamaBaseUrl

    }

  });

  ragService = simpleRAG(

    expandPath("./Documents/nested"),

    chatModel,

    {

      vectorStore: vectorStoreClient,

      recursive: true,

      continueOnError: false,

      chunkSize: 200,

      chunkOverlap: 50

    }

  );

  ragService.ingest();

</cfscript>

Qdrant

Qdrant uses url, apiKey, collectionName, metricType, dimension, and embeddingModel.

Parameter	Description
provider	Set to "qdrant".
url	gRPC URL of the Qdrant server.
apiKey	API key for authenticating with Qdrant.
collectionName	Collection name. Use a unique name per run to avoid collisions.
metricType	Similarity metric. Typically "COSINE".
dimension	Vector dimension. Must match the embedding model output — set 384 for all-minilm.
embeddingModel	Nested struct: provider, modelName, and connection details.

<cfscript>

  vectorStore = {

    provider: "qdrant",

    url: application.vectorDB.qdrant.grpcUrl,

    apiKey: application.vectorDB.qdrant.apiKey,

    collectionName: "simplerag_nested_test_" & dateFormat(now(), "yyyymmdd"),

    metricType: "COSINE",

    dimension: 384,

    embeddingModel: {

      provider: "ollama",

      modelName: "all-minilm",

      baseUrl: application.ollamaBaseUrl

    }

  };

  ragService = simpleRAG(

    expandPath("./Documents/nested"),

    chatModel,

    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }

  );

  ragService.ingest();

</cfscript>

Milvus

Milvus uses url, databaseName, collectionName, dimension, indexType, metricType, and embeddingModel.

Parameter	Description
provider	Set to "milvus".
url	gRPC URL of the Milvus server.
databaseName	Milvus database name. Typically "default".
collectionName	Collection name. Use a unique name per run.
dimension	Vector dimension. Must match embedding model output — set 384 for all-minilm.
indexType	Index algorithm. Typically "HNSW" for approximate nearest-neighbour search.
metricType	Similarity metric. Typically "COSINE".
embeddingModel	Nested struct: provider, modelName, and connection details.

<cfscript>

  vectorStore = {

    provider: "milvus",

    url: application.vectorDB.milvus.grpcUrl,

    databaseName: "default",

    collectionName: "simplerag_nested_test_" & dateFormat(now(), "yyyymmdd"),

    dimension: 384,

    indexType: "HNSW",

    metricType: "COSINE",

    embeddingModel: {

      provider: "ollama",

      modelName: "all-minilm",

      baseUrl: application.ollamaBaseUrl

    }

  };

  ragService = simpleRAG(

    expandPath("./Documents/nested"),

    chatModel,

    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }

  );

  ragService.ingest();

</cfscript>

Pinecone

Pinecone uses apiKey, index, serverless (with dimension, cloud, region, deletionProtection), and embeddingModel. You may use vectorStore.deleteCollection() after ingest to clean up.

Parameter	Description
provider	Set to "pinecone".
apiKey	Pinecone API key.
index	Name of the Pinecone index.
serverless	Nested struct for serverless configuration: • dimension: vector dimension (must match embedding model). • cloud: cloud provider (e.g. aws). • region: deployment region. • deletionProtection: set to "disabled" to allow cleanup.
embeddingModel	Nested struct: provider, modelName, and connection details.

<cfscript>

  vectorStore = VectorStore({

    provider: "pinecone",

    apiKey: application.vectorDB.pinecone.apiKey,

    index: application.vectorDB.pinecone.index,

    serverless: {

      dimension: application.vectorDB.pinecone.serverless.dimension,

      cloud: application.vectorDB.pinecone.serverless.cloud,

      region: application.vectorDB.pinecone.serverless.region,

      deletionProtection: "disabled"

    },

    embeddingModel: {

      provider: "ollama",

      modelName: "all-minilm",

      baseUrl: application.ollamaBaseUrl

    }

  });

  ragService = simpleRAG(

    expandPath("./Documents/nested"),

    chatModel,

    { vectorStore: vectorStore, recursive: true, chunkSize: 200, chunkOverlap: 50 }

  );

  ragService.ingest();

  vectorStore.deleteCollection();

</cfscript>

Standalone ingest with documentService().ingest()

Load and split with documentService(), then call ingest(segments, vectorStoreClient, options) with batchSize and continueOnError. This pattern gives you full control over each pipeline stage before writing to the vector store.

Parameter	Description
segments	Array of segment structs returned by split(). Each segment has text and metadata.
vectorStoreClient	A configured VectorStore object to write embeddings into.
batchSize	How many segments to embed and write per internal batch. Higher values can improve throughput but increase memory use.
continueOnError	When true, ingestion skips or logs failed segments and continues. When false, the job stops on the first error.

<cfscript>

  docService = documentService();

  documents = docService.load({

    path: application.getDocumentsDir(),

    pattern: "*.txt"

  });

  segments = docService.split(documents, { chunkSize: 500, chunkOverlap: 50 });

  vectorStoreClient = VectorStore({

    provider: "qdrant",

    url: application.vectorDB.qdrant.grpcUrl,

    apiKey: application.vectorDB.qdrant.apiKey,

    collectionName: "dps_ingest_qdrant_test",

    metricType: "COSINE",

    dimension: 384,

    embeddingModel: {

      provider: "ollama",

      modelName: "all-minilm",

      baseUrl: application.ollamaBaseUrl

    }

  });

  result = docService.ingest(segments, vectorStoreClient, {

    batchSize: 50,

    continueOnError: true

  });

  writeOutput(result.segmentsIngested & " ingested, " & result.segmentsFailed & " failed");

</cfscript>

The result struct contains segmentsIngested (count of segments successfully written to the store) and segmentsFailed (count of segments that could not be embedded or persisted).

Note: documentService().ingest() is the standalone equivalent of the vectorStoreIngestor stage inside agent(). Use it when you want to load, split, optionally transform, and then ingest independently, without building a full agent() pipeline.

Was this page helpful?

We're glad. Tell us how this page helped.

Found the answer to my problem Understood the instructions Liked the feature

Other suggestions

We're sorry. Can you tell us what didn't work for you?

Didn't find the answer to my problem Couldn't understand the instructions Didn't like the feature

Other suggestions

Thank you for your feedback. Your response will help improve this page.

Was this helpful?

We are sorry the content didn't meet your needs.

Share additional feedback to help us improve.

0/255 | Character limit exceeded.

Thank you so much for sharing your feedback!