Semantic Search & RAG Tool

The GitHub Semantic Code Search Tool

Search your vectorized GitHub data by meaning across repositories and components, with similarity scores and links back to the source — the retrieval layer for any agent that reasons over your codebase.

Explore VDF AI Agents
MeaningFind code by intent
ScopedRepos, components, or both
LinkedResults carry source URLs
100%On-prem, code never leaves
The Codebase Problem

The function you need exists — somewhere

In a large codebase, finding the right implementation means knowing the file name or the exact symbol. Newcomers and agents alike waste time grepping for code they can’t name.

01

You must know the name

Text search only works if you already know what the code is called.

02

Concepts span files

"How do we handle auth?" isn’t one file — it’s a pattern scattered across the repo.

03

No relevance ranking

Grep returns every match equally; nothing tells you which one matters.

04

Proprietary code can’t leave

Your source is IP — it can’t go to a hosted code-search service.

How the Tool Works

Meaning-aware search across your repos

Semantics

Describe it, don’t name it

Find code by what it does.

The tool embeds your natural-language query and matches it against vectorized repositories and components, returning the most relevant code even when you don’t know the file or symbol name.

  • Natural-language code search
  • Repositories and/or components
  • Similarity threshold filtering
  • Scope to a single owner/repo
Intent
Code by Meaning

No file name needed

EmbeddingsReposComponentsScoped

Precision

Ranked, linked, thresholded

Only the matches worth reading.

Each hit comes with a similarity score and a URL back to the source, and min_similarity lets an agent drop weak matches so it acts only on high-confidence results.

0–1
Similarity Gate

Filter low-confidence hits

ScoredLinkedThresholdPrecise

Governance

On-prem code retrieval

Your source stays your source.

The index and search run inside your perimeter, scoped per user with audit logging — making semantic code search usable on proprietary repositories that can’t touch hosted tools.

100%
On-Prem

Per-tenant, logged

On-premIP-safeRBACAudit log
Inputs

Parameters

The github_vector_search tool accepts these inputs when an agent calls it. Required inputs are flagged.

Name Type Required Description
query string Required Search query for semantic matching.
user_id integer Required User ID for multi-tenant isolation.
collection string
default: all
Optional Type of GitHub data to search. repositoriescomponentsall
top_k integer
default: 10
Optional Maximum number of results to return (1–50).
owner string Optional Optional repository owner to scope the search when paired with repo.
repo string Optional Optional repository name to scope the search when paired with owner.
min_similarity number
default: 0.3
Optional Minimum cosine similarity (0–1) a result must reach. Set 0 to disable.
Where it pays back

Where code search pays back

Codebase onboarding

Let a new engineer ask "where do we validate webhooks?" and jump straight to the code.

Reuse discovery

Find an existing utility before someone reimplements it.

Impact scoping

Locate every component that touches a concept before a refactor.

Agent grounding

Ground an engineering agent’s answers in your actual repositories.

Pattern audits

Find all places a risky pattern appears, by meaning rather than regex.

Cross-repo search

Search across many repos at once, or scope to one with owner/repo.

How VDF AI connects it

Assigned to agents, orchestrated as networks

On VDF AI, an industry’s use cases map to agents, and you assign tools like this one to those agents. Compose multiple agents into a governed, on-premise network.

ROI Snapshot

What changes after you assign it

Faster
Time to find the right code
Linked
Straight to the source
High-conf
Only thresholded matches
100%
Searchable without code leaving
FAQ

Questions about the GitHub Semantic Code Search tool

What is GitHub semantic code search?

It is a tool that searches your vectorized GitHub repositories and components by meaning, returning ranked matches with similarity scores and source URLs. Assigned to an agent, it lets the agent find and reason over your real code without knowing exact file names.

Can I search a single repository?

Yes. Provide owner and repo together to scope the search to one repository, or omit them to search across everything indexed for the user.

What does min_similarity do?

It sets the minimum cosine similarity (0–1) a result must reach to be returned, defaulting to 0.30. Raising it makes results stricter; setting it to 0 disables the threshold.

Is our source code exposed?

No. Indexing and search run on-premise or in your sovereign cloud, scoped per user and audit-logged, so proprietary code never leaves your perimeter.

How does it pair with other tools?

It is often assigned alongside the GitHub repository explorer and code review tools so an agent can find code, inspect it, and review it — and combined in a network with other agents.

Let agents search your code by meaning

See GitHub semantic code search assigned to an engineering agent — on infrastructure you control.