Exploring your data | VDF AI Documentation

“Is this data any good?” — answered in one click

Before you build a report, train a model, or hand a dataset to a stakeholder, someone has to answer the most boring and most important question in data work: is this data actually any good?

That question used to take hours. You’d open a notebook, write a few queries, copy the output into a doc, write a verdict. By the time you finished, the data had often changed.

Exploratory Data Analysis (EDA) in VDF AI Data does it in one click. You pick an asset, hit Run analysis, and a few moments later you have a complete read on what’s healthy, what’s risky, and what’s quietly drifting.

Run EDA the moment you start trusting an asset. Not when something breaks. Five minutes of upfront EDA tells you what to expect — and turns "the data looks weird today" into "the data drifted in this column three weeks ago, and here's why."

Who this is for

EDA is written for people who need to make decisions about data, not just compute on it:

Analysts sizing up a new dataset before building on top.
Product and operations leads validating that a source is healthy enough to depend on.
Workspace admins auditing what counts as a “golden source” and what doesn’t.
Anyone joining a team who needs to understand a dataset they didn’t build.

You don’t need to write SQL. You don’t need to know statistics. The screens explain themselves.

What EDA gives you

An EDA run produces two things: a summary for the whole table and a profile for each column.

The table-level summary

A short scorecard that captures the shape of the dataset at a glance.

Missing values

What share of cells across the table are empty. Low is good. A sudden jump from one run to the next is a signal something changed upstream.

Duplicate rows

What share of rows are exact duplicates of another row. Often surprising — and often the first thing your downstream report needs to know about.

Outlier columns

Columns where the values look unusual relative to history — sudden spikes, unexpected nulls, value-set changes.

Class imbalance

For categorical columns, whether one value dominates so heavily it would distort downstream work. Common in fraud, churn, and rare-event data.

Overall quality score

A single rolled-up number you can sort and compare on. Same scoring used on the asset card after [discovery](/docs/products/vdf-ai-data/discovering-your-data).

Last run

When EDA was last run on this asset. Old analysis on a fast-moving table is a yellow flag of its own.

The column profile

For each column, EDA produces a short profile.

Field	What it tells you
Type	Numeric, datetime, categorical, or text — inferred from the actual values.
Missing %	What share of rows have no value in this column.
Unique %	How many distinct values the column holds. 100% means every row is unique (an ID or a timestamp). 1% means a small number of repeated values (a status, a country).
Distribution summary	For numeric columns: min, max, mean, standard deviation. For categorical columns: the top values and their share.
Drift signal	A simple low / medium / high indicator: how much the column’s shape has shifted since the last EDA run.

Reading a drift signal

Drift is the most often-misunderstood concept in EDA. It’s also the most useful.

Drift means: “the shape of this column has changed in a way that’s worth noticing.” Not necessarily worth panicking. Just worth noticing.

A few examples that make it concrete:

A country column that used to be 60% US starts trending 40% — a marketing campaign is landing somewhere new.
A latency_ms column where the average doubled overnight — a deployment broke an upstream dependency.
A subscription_status column where “cancelled” is rising — there’s a churn problem brewing.

VDF AI Data summarizes drift in three buckets:

Signal	What it means	What to do
Low	The column looks like itself	No action needed
Medium	The shape has moved, not dramatically	Open the column profile; check if it tracks a known event
High	The shape has changed substantively	Investigate before depending on this column

Drift isn't always bad. Sometimes drift is exactly what you wanted to see — a campaign worked, a fix shipped, a market expanded. EDA tells you something changed. You decide whether that's good news or bad.

How to read an EDA report

Three quick patterns:

When the score is high

If the summary score is high and no column is flagged high-drift, you can confidently build on this asset. Spot-check the columns you’ll depend on most — distributions, unique counts — and you’re done.

When the score is mid-range

Open the column profile and look for one of two patterns: missing-value clusters (a chunk of columns where missingness suddenly rose) or drift clusters (multiple columns shifting in the same way). Either pattern usually points to an upstream change you’ll want to understand before depending on this asset.

When the score is low

Don’t assume the data is broken — assume it’s misunderstood. Talk to the owner. The most common cause of a low-score new connection isn’t bad data — it’s that the discovered scope included a few staging tables, or a table that’s expected to be sparse, or an experimental dataset that hasn’t been cleaned yet. Excluding a handful of assets often fixes the perceived quality at the source.

When to re-run EDA

EDA is cheap. Run it when something changes — and run it on a cadence for the assets that matter most.

On a new asset. Always. EDA is your first read.
After an upstream change. A new column, a renamed table, a refreshed load.
Before a major decision. Building a new report, training a new model, handing a dataset to an external team.
On a calendar for golden sources. Weekly or monthly EDA on the dozen assets your team relies on most.

What EDA does not do (and what to use instead)

EDA is great at telling you about the shape of the data. It’s not great at telling you:

Whether the data is correct. A column can be high-quality, low-missing, and structurally pristine — and still record the wrong value. Correctness is a domain question, not a statistical one.
Why drift happened. EDA flags drift. Investigating the cause means looking upstream — at deployments, schema changes, business events.
What to build with the data. EDA tells you what you have. Deciding what to do with it is the next step — see Features and relationships.

A short EDA habit that pays off

A pattern that works on most teams:

On every new connection, run EDA on the top 5 assets.
On every Monday morning, glance at the EDA dashboard for your team’s golden sources. Drift signals jump out fast.
On every quarterly review, run a fresh EDA pass on your full asset list — and update tags or comments for anything that moved.

Ten minutes, three habits, and your team always knows what’s healthy.

Where to go next

Features and relationships — turn what you found into something the team can use.
Discovering your data — the step before EDA: surfacing what you have.
Vector indexes and semantic search — make text columns searchable once you trust them.
Connecting databases — for the moment you need a new source.