From “what we have” to “what matters”
After discovery and exploratory analysis, you know what’s in your data. The next question is sharper: of all the columns and tables you have, which ones matter for the thing you’re actually trying to do?
That’s the question feature lists answer. A feature list is a small, named, opinionated view of your data — the handful of columns that matter for one use case. Marketing has their list. Risk has theirs. Customer success has theirs. Each list pulls from the same underlying assets but lives apart.
Feature lists are how a database stops being intimidating. A 400-column table is overwhelming. The 12-column list that matters for churn prediction is a tool you can actually use.
Who this is for
- Analysts who repeatedly pull the same handful of fields and want to stop reinventing the wheel.
- Data leads who want to publish opinionated, reusable views for their team.
- Anyone preparing data for an agent or a model who needs to capture “these are the inputs that matter.”
Three things you can do here
This page covers three connected capabilities. Each builds on the one before it.
Feature lists
Hand-curated views of "the columns that matter for X." One asset can have many lists, one per use case.
Feature discovery
VDF AI Data suggests derived features — combinations and transformations of existing columns — with a confidence score on each.
Association analysis
A relationship map: which columns move together, which depend on each other, and which are surprisingly correlated.
Feature lists
A feature list is a saved, named selection of columns from an asset.
You give it a purpose (“Churn-prediction features,” “Marketing-campaign inputs,” “Risk-scoring inputs”) and pick the columns that matter. From then on, anyone on your team can open the list and see exactly what’s in scope — without having to ask which 12 of the 400 columns you actually use.
Why multiple lists per asset
Different teams use the same data for different things. A single customers table might back:
- A marketing list focused on segment, channel, engagement, last-touch.
- A risk list focused on tenure, payment history, dispute counts, geography.
- A support list focused on plan tier, last contact, open ticket count.
Each list is a small, focused, named thing. None of them has to compromise for the others.
Marking a default list
Most teams have one list that’s the “obvious starting point” for an asset. Mark it as the default and it becomes what teammates see first when they open the asset.
A useful pattern: one default list per asset, plus as many specialized lists as you need. The default answers “what should I use if I don’t know yet?” The specialized lists answer “what should I use if I know exactly what I’m doing?”
Editing and versioning
Feature lists are not frozen. As the underlying data evolves — new columns added, deprecated columns removed — you update the list. VDF AI Data keeps a small history so you can see what the list looked like a month ago and why a downstream report might be reading a different shape today.
Feature discovery
Feature discovery is the AI step. You point it at an asset, ask it to look for useful derived features, and it returns a list of candidates — each with a one-line description and a confidence score.
A derived feature is something computed from one or more existing columns. A few examples that show up frequently:
- Aggregates — “average order value per customer over the last 90 days.”
- Ratios — “support tickets per active month.”
- Recency signals — “days since last login.”
- Flags — “ever-cancelled customer,” “ever-disputed transaction.”
- Bucketed values — “tenure in months, grouped into bands.”
You don’t have to take them all. The screen is built for triage: read the suggestion, look at the confidence score, accept it into a feature list or dismiss it.
Confidence is a signal, not a verdict. A high confidence score means the suggestion is statistically promising for the kind of work this asset usually feeds. It doesn't mean the feature is right for your particular use case. Read the description. Trust your domain knowledge over the score.
When to run feature discovery
- Before a new project. Discovery surfaces things you’d have spent a week deriving by hand.
- After a schema change. New columns may unlock new derived features.
- As a periodic refresh. Quarterly discovery on important assets keeps your team’s feature library fresh.
Association analysis
The third capability is the most powerful — and the most surprising. Association analysis maps the relationships between columns in your data.
Pick an asset, run analysis, and VDF AI Data produces a relationship view: which columns move together, which seem to depend on each other, and which are correlated in a way you didn’t expect.
What it surfaces
Three classes of relationships you’ll see most often:
- Expected relationships —
countryandcurrency.signup_dateandtenure_months. The plumbing of the schema, confirmed. - Strong-but-not-obvious relationships —
plan_tierandsupport_ticket_count.channelandconversion_rate. The kinds of things a domain expert would have suspected but few others would have seen. - Surprising relationships — a column you thought was unrelated turns out to track another column closely. These are the moments worth investigating. They often point to a data-collection quirk, an upstream coupling, or — occasionally — a real business insight.
How to read the output
Each relationship comes with a strength indicator. Strong relationships are louder. Weak relationships are quieter. The screen lets you filter to “show me only strong relationships” or “show me everything above a threshold.”
A pattern that works: scan strong relationships first to verify what you expected, look for surprises, then dig into one or two surprises with the asset open in EDA to understand why.
What association analysis isn’t
A correlation is not a cause. If tenure and revenue are strongly correlated, that’s interesting — but you don’t know yet whether longer tenure drives more revenue, or whether higher-revenue customers stay longer, or whether something else explains both. Association analysis points to where to look. The investigation is still up to you.
Don't ship a strategy off a single association. Strong correlation is a hypothesis generator, not a decision generator. Treat surprises as questions worth investigating, not as conclusions worth acting on.
Putting all three together
A workflow that combines feature lists, discovery, and association analysis:
-
Start with a feature list.
Pick the columns you already know matter for your use case. Save it.
-
Run feature discovery.
Accept the high-confidence derived features that look useful. Add them to your list.
-
Run association analysis.
Look for surprises. Investigate the strongest ones.
-
Refine the list.
Drop columns you don't actually need. Add ones the analysis pointed you to.
-
Make it the default.
Mark the list as the asset's default if it's the obvious starting point for your team.
By the end, your asset has a curated, opinionated view that captures what matters — and a paper trail of how you got there.
Why this matters downstream
Everything else in VDF AI Data — semantic search, fine-tuning, agents, networks — gets sharper when the source has a feature list.
- A search index built from a feature list is more focused.
- A fine-tuning dataset that uses a feature list is cleaner and faster to assemble.
- An agent pointed at a feature list has a clearer picture of “what to look at.”
Feature lists aren’t a side capability. They’re the bridge between raw data and useful AI.
Where to go next
- Vector indexes and semantic search — use feature lists to scope what gets indexed.
- Fine-tuning datasets — turn feature lists into training pairs.
- Exploring your data — go deeper on the quality of the columns in your list.
- Discovering your data — the upstream step of finding assets in the first place.