VDF AI Data

Vector indexes and semantic search

Build user-controlled indexes from your data so chats, agents, and networks can search by meaning — with the chunking, model, and scope you choose.

3 min read
On this page

When search-by-meaning needs an index of its own

Searching your knowledge covers how VDF AI searches across your sources by default. That works beautifully out of the box for documents.

But there are moments when you want more control. You want to decide what gets indexed. You want to choose how text is split into searchable pieces. You want to pick the model that powers the search. And you want to know exactly what your team is searching across when they ask a question.

That’s what a vector index gives you: a search surface you own, scoped to data you choose, powered by a model you pick.

You can ignore this page completely and search will still work. Vector indexes are for the moments when default search isn't sharp enough — usually because you're searching across a database column, a curated feature list, or a very large corpus and want the search experience to match.

Who this is for

  • Data leads building a focused semantic search over a specific dataset.
  • Analysts and operations teams who want to ask natural-language questions over tables.
  • Workspace admins curating a “golden” search index for their team to use across products.

You don’t need any technical background. The screens walk you through every choice.

What a vector index is, in one sentence

A vector index is a searchable layer built from one of your sources, where the search understands meaning rather than just keywords.

You build it. You name it. You pick what goes in. Once built, every other product surface — Chat, Agents, Networks — can search it.

What you can build an index from

The most common sources:

A database asset

Index one or more text-heavy columns of a connected table — descriptions, notes, reviews, transcripts, status comments.

A feature list

Use the curated list you built in [Features and relationships](/docs/products/vdf-ai-data/features-and-relationships) as the scope of the index.

A connected app

Index a specific Confluence space, Jira project, or GitHub repo as a standalone search surface.

A file collection

Group uploaded files and build an index over only that collection.

Building an index — the choices you make

Every index is one short form. Three meaningful choices.

1. What goes in

Pick the source and scope it as narrowly as you can. A narrower index produces sharper search than a wider one. You can always add a second index later for a different scope.

2. How text is split

VDF AI Data breaks the source into small pieces before it can be searched. Two numbers control how:

  • Chunk size. How much text goes into each piece. Smaller chunks (a few sentences) make searches precise. Larger chunks (a few paragraphs) preserve more context.
  • Overlap. How much of one chunk is repeated at the start of the next. Some overlap helps the search find ideas that span chunk boundaries.

The defaults work for most teams. The first index you build, accept the defaults. After you've searched it for a few days, you'll know whether the answers feel "too narrow" (raise chunk size) or "too generic" (lower it).

A simple guide:

If your source is…Try this
Short text snippets (reviews, support tickets, status comments)Smaller chunks, low overlap
Long-form documents (policies, contracts, articles)Medium chunks, moderate overlap
Mixed-length materialDefaults — they’re tuned for this case

3. Which embedding model

The embedding model is what turns text into a form a search can match by meaning. VDF AI Data has a catalog of models. They differ on three axes:

  • Quality. Some are better at picking up subtle differences in meaning.
  • Speed. Some are faster to index and faster to search.
  • Language coverage. Some are tuned for English, others handle many languages well.

The catalog labels the trade-offs clearly. For most teams, the default model is a solid starting point. Switch when you know why — for example, when your data is mostly in a non-English language and a language-specific model exists.

What happens after you click “Build”

Three stages, all visible in the build log.

  1. Reading.

    VDF AI Data pulls the source content. For a database column, this is a snapshot read. For documents, it's a fetch of the current versions.

  2. Chunking.

    The content is split into searchable pieces according to your chunk-size and overlap settings.

  3. Embedding.

    Each chunk is processed by the embedding model. This is where most of the build time lives — and where bigger indexes take longer.

When the build finishes, the index moves to Ready state and is immediately usable across every other VDF AI product surface.

Build status, in plain language

StateWhat it meansWhat to do
DraftYou’re still configuring; not building yetFinish the form and click Build
RunningBuild in progress; you can watch the logWait — the time scales with source size
ReadyIndex is live and searchableStart using it in Chat, Agents, Networks
Needs attentionBuild hit an issue (source unavailable, model busy)Read the log; usually retry-able

Searching the index

Once an index is Ready, you search it from any product surface — or directly from the index detail page.

The search interface is plain language:

  • “What do customers say about onboarding speed?”
  • “Find every reference to refunds for the EU market.”
  • “Show me the comments about latency on the orders table.”

You can also pick how many results to return (top 5 is usually enough; bump to 20 for a broader sweep) and read the matching chunks ranked by relevance.

Search history

Every search is logged for you — the query, the time, how many results came back. Two reasons to look at the history:

  1. Refining a workflow. If you keep asking variations of the same question, your team probably needs a feature list, an agent, or a saved view of that question.
  2. Understanding usage. Workspace admins use search history to see which indexes are getting used and which aren’t.

When to rebuild an index

A vector index is a snapshot. It doesn’t refresh automatically — and most of the time you don’t want it to.

Two natural rebuild triggers:

  • The source changed substantially. A bulk reload, a schema change, a new chunk of documents.
  • You changed your mind about chunking or model. Rebuild with the new settings; compare the search results before/after.

Set a calendar reminder for a monthly rebuild on indexes that back important workflows. Routine, predictable, and you never get caught with a stale index powering a customer-facing search.

A few patterns that work

One narrow index per use case, not one giant index

A focused index on “support ticket comments” produces sharper search than a sprawling index on “everything from the support database.” Multiple narrow indexes beat one wide one almost every time.

Pair an index with a feature list

If you’ve built a feature list for an asset, scope the index to the same list. The search and the analysis stay aligned.

Track which products use which index

In your team’s docs, write down which index powers which agent, network, or workflow. When you rebuild, you know what to retest.

Where to go next