Semantic Search & RAG Tool

The GitHub Semantic Code Search Tool

Search your vectorized GitHub data by meaning across repositories and components, with similarity scores and links back to the source — the retrieval layer for any agent that reasons over your codebase.

Explore VDF AI Agents

MeaningFind code by intent

ScopedRepos, components, or both

LinkedResults carry source URLs

100%On-prem, code never leaves

The Codebase Problem

The function you need exists — somewhere

In a large codebase, finding the right implementation means knowing the file name or the exact symbol. Newcomers and agents alike waste time grepping for code they can’t name.

You must know the name

Text search only works if you already know what the code is called.

Concepts span files

"How do we handle auth?" isn’t one file — it’s a pattern scattered across the repo.

No relevance ranking

Grep returns every match equally; nothing tells you which one matters.

Proprietary code can’t leave

Your source is IP — it can’t go to a hosted code-search service.

How the Tool Works

Meaning-aware search across your repos

Semantics

Describe it, don’t name it

Find code by what it does.

The tool embeds your natural-language query and matches it against vectorized repositories and components, returning the most relevant code even when you don’t know the file or symbol name.

Natural-language code search
Repositories and/or components
Similarity threshold filtering
Scope to a single owner/repo

Intent

Code by Meaning

No file name needed

EmbeddingsReposComponentsScoped

Precision

Ranked, linked, thresholded

Only the matches worth reading.

Each hit comes with a similarity score and a URL back to the source, and min_similarity lets an agent drop weak matches so it acts only on high-confidence results.

0–1

Similarity Gate

Filter low-confidence hits

ScoredLinkedThresholdPrecise

Governance

On-prem code retrieval

Your source stays your source.

The index and search run inside your perimeter, scoped per user with audit logging — making semantic code search usable on proprietary repositories that can’t touch hosted tools.

100%

On-Prem

Per-tenant, logged

On-premIP-safeRBACAudit log

Inputs

Parameters

The github_vector_search tool accepts these inputs when an agent calls it. Required inputs are flagged.

Name Type Required Description

query string Required Search query for semantic matching.

user_id integer Required User ID for multi-tenant isolation.

collection string
default: all Optional Type of GitHub data to search. repositoriescomponentsall

top_k integer
default: 10 Optional Maximum number of results to return (1–50).

owner string Optional Optional repository owner to scope the search when paired with repo.

repo string Optional Optional repository name to scope the search when paired with owner.

min_similarity number
default: 0.3 Optional Minimum cosine similarity (0–1) a result must reach. Set 0 to disable.

Where it pays back

Where code search pays back

Codebase onboarding

Let a new engineer ask "where do we validate webhooks?" and jump straight to the code.

Reuse discovery

Find an existing utility before someone reimplements it.

Impact scoping

Locate every component that touches a concept before a refactor.

Agent grounding

Ground an engineering agent’s answers in your actual repositories.

Pattern audits

Find all places a risky pattern appears, by meaning rather than regex.

Cross-repo search

Search across many repos at once, or scope to one with owner/repo.

How VDF AI connects it

Assigned to agents, orchestrated as networks

On VDF AI, an industry’s use cases map to agents, and you assign tools like this one to those agents. Compose multiple agents into a governed, on-premise network.

Industry Your sector Finance, healthcare, telecom, government, and more. Use Case A job to be done Concrete workflows the business needs solved. Agent A specialized worker Governed AI agents that execute the use case. Tool GitHub Semantic Code Search The capability you assign to an agent. Network Agents, orchestrated Many use cases and agents, working as one.

ROI Snapshot

What changes after you assign it

Faster

Time to find the right code

Linked

Straight to the source

High-conf

Only thresholded matches

100%

Searchable without code leaving

FAQ

Questions about the GitHub Semantic Code Search tool

What is GitHub semantic code search?

It is a tool that searches your vectorized GitHub repositories and components by meaning, returning ranked matches with similarity scores and source URLs. Assigned to an agent, it lets the agent find and reason over your real code without knowing exact file names.

Can I search a single repository?

Yes. Provide owner and repo together to scope the search to one repository, or omit them to search across everything indexed for the user.

What does min_similarity do?

It sets the minimum cosine similarity (0–1) a result must reach to be returned, defaulting to 0.30. Raising it makes results stricter; setting it to 0 disables the threshold.

Is our source code exposed?

No. Indexing and search run on-premise or in your sovereign cloud, scoped per user and audit-logged, so proprietary code never leaves your perimeter.

How does it pair with other tools?

It is often assigned alongside the GitHub repository explorer and code review tools so an agent can find code, inspect it, and review it — and combined in a network with other agents.

Agents that use it

Assign GitHub Semantic Code Search to these agents

These VDF AI agents can be assigned this tool. Open an agent to see the full toolkit it can run.

Related tools

Tools that work well alongside this one

Keep exploring

Where this tool delivers value

DevOps PR Review Agent AI-Driven Cost Efficiency in IT Product Teams Telecommunications Browse all tools

Let agents search your code by meaning

See GitHub semantic code search assigned to an engineering agent — on infrastructure you control.

See how tools work on VDF AI Deploy on your own infrastructure