Discovering your data | VDF AI Documentation

Most teams don’t actually know what’s in their data

There’s a database. It has hundreds of tables. Some are critical. Some are abandoned. Most look the same from the outside. Asking “what do we actually have?” is the first step in every data project — and traditionally it has taken weeks.

Discovery turns that question into a one-screen answer. Connect a source, run discovery, and watch VDF AI Data surface what’s actually inside.

Discovery is the moment a source becomes useful. A connected database that hasn't been discovered yet is just a hostname. After discovery, every table is a thing your team can search, profile, and reason about.

Who this is for

Analysts who want to see the lay of the land before writing a query.
Product and operations leads who own a process and want to know what data backs it.
Workspace admins who curate which assets the rest of the team should rely on.
New team members trying to learn what the team’s data actually looks like.

What discovery surfaces

When you trigger discovery on a connected source, VDF AI Data scans for every readable table, view, and dataset — and produces a card for each. Each card carries the things a person actually needs to know.

What it is

A clear name, the database it came from, and the column structure at a glance.

How big it is

Row count and a freshness timestamp so you know whether it's a 100-row reference table or a billion-row event log.

Who owns it

The owner field — usually a person or a team — so you know who to ask before you depend on it.

A quality score

A 0–100 score that summarizes completeness, consistency, and drift signals so you can sort obvious winners from obvious risks.

Comments

A running thread per asset. Use it to leave context that future-you and future teammates will need.

Browsing what you have

The Assets screen is your home base. From there you can:

Search by name or description. Type a word and watch the list narrow.
Filter by source. Show only the tables from one specific connection.
Filter by tag. “Show me everything tagged ‘finance-critical’.”
Sort by quality score. Spot risky assets quickly. Or sort by freshness when you’re chasing the freshest data.
Toggle to your favorites. Mark the tables you actually use and surface them at the top.

The screen is designed for people who don’t write SQL. You don’t need to know the schema by heart to find what you need.

Understanding the quality score

The quality score is the single number that captures “should I trust this table?” It rolls up a few signals:

Completeness. How many cells are missing values.
Consistency. Whether the same kind of value appears in the same kind of place.
Freshness. How recently the table was updated.
Drift. Whether the shape of the data has been changing.

The quality score is a starting point, not a verdict. A score of 95 doesn't mean the data is perfect for your use case. A score of 60 doesn't mean the data is unusable. Treat the score as the first signal — and pair it with a short read of the column profile before you build on top.

A useful rule of thumb:

Score range	What to do
80–100	Safe defaults. Use confidently; spot-check what matters.
60–79	Investigate before depending on it. Run exploratory analysis.
Below 60	Treat with caution. Ask the owner before relying on it.

Tagging and organizing

Tags are the cheapest, fastest way to bring order to a long list. A small set of conventional tags goes a long way.

A pattern that works for most teams:

By domain — customers, orders, support, marketing.
By sensitivity — pii, confidential, public.
By maturity — production, staging, experimental.
By status — needs-review, deprecated, golden-source.

Pick a small set, agree on them once with your team, and apply them consistently. Five well-chosen tags beat fifty inconsistent ones.

Comments — the most underused feature

Every asset has a comment thread. Use it for the kinds of things you’d put in a wiki but never quite do:

“Don’t trust the country column before 2024 — we changed the encoding.”
“This was the original orders table. The replacement is orders_v2. Leave this here until Q3.”
“If row counts spike on a Monday, that’s the weekly batch — not an incident.”

Comments stay attached to the asset. New teammates find them automatically when they open the asset for the first time.

Versions and history

When the shape of an asset changes — a new column is added, a column type changes, rows get reloaded — VDF AI Data records the change. The asset’s history view shows:

When the change was detected.
What changed (columns added, columns removed, row-count delta).
Whether the change moved the quality score up or down.

Two reasons to look at history:

Diagnosis. When a downstream report breaks, the asset’s history often tells you why.
Confidence. When a stakeholder asks “is this stable?”, a clean history is the best answer you can give them.

How discovery uses connections

Discovery runs against an already-connected source. If you don’t have a source connected yet, start with Connecting sources or Connecting databases first.

Once connected:

Open the connection.
From the Data area, click into the source you want to discover.
Run discovery.
One click. VDF AI Data scans the readable tables and views in the scoped database.
Review what came back.
A list of assets, each with metadata, ownership, quality, and tags ready to apply.
Tag the important ones.
Five minutes of tagging pays back forever.
Open the assets that matter.
For anything you'll depend on, run [exploratory analysis](/docs/products/vdf-ai-data/exploring-your-data) next.

When to re-run discovery

You don’t need to re-run discovery every day. Two natural triggers:

Schema changes upstream. Tables added, removed, or renamed.
A new team using the connection. A fresh discovery + a tagging pass makes the source feel curated rather than chaotic to new arrivals.

Set a calendar reminder for a monthly discovery refresh on your most-used sources. It’s a small investment that catches drift before downstream consumers feel it.

A useful first-week ritual

If you’ve just connected a major database and don’t know where to start:

Run discovery. Look at the full list.
Tag the top 20 assets. The ones you know matter to your team.
Open the top 5. Run exploratory analysis on each.
Comment what you learn. Quirks, gotchas, “ask Sarah” notes.

By the end of the week, your team has a curated, opinionated view of your data — built from what you actually use, not from a wishlist.

Where to go next

Exploring your data — go from “what we have” to “how healthy is it.”
Features and relationships — group what matters by use case.
Vector indexes and semantic search — make discovered assets searchable by meaning.
Connecting databases — set up the source before you can discover what’s in it.