Connecting databases | VDF AI Documentation

Why a database connection is different from a file upload

A file upload is a snapshot. A database connection is a living source. The same row your team edits in your operational system this morning is the row VDF AI sees this afternoon — without anyone having to export, upload, or re-sync.

That live link is what turns VDF AI Data from a document-search layer into a real working layer over your business: customers, orders, products, inventory, telemetry, support tickets, anything that lives in a database. Every other thing the platform does — semantic search, exploratory analysis, feature engineering, fine-tuning preparation — can sit on top of it.

You don't need to be a database administrator to do this. If you have a hostname, a database name, and a read-only account, you have everything you need. The screens guide the rest.

Before you start

A handful of things make the first connection go smoothly. Have them ready:

A hostname. The address your database listens on. Often something like db-prod.acme.internal or a cloud-provider endpoint.
A database or schema name. The specific store you want VDF AI Data to look at — not the whole server.
A read-only username and password. Created just for VDF AI Data, with permission to read the schemas you want surfaced and nothing else.
Network reachability. From wherever VDF AI Data runs, your database must be reachable. If you use a private network, your platform team may need to allowlist a single egress address.

Always use a dedicated read-only account. Don't reuse the same login your application uses. A separate account makes it easy to audit what VDF AI Data is doing and to revoke access in one click if you ever need to.

The databases you can connect

VDF AI Data ships with first-party support for the most common operational and analytical databases.

PostgreSQL

The most common transactional database. Works with cloud-managed Postgres (RDS, Cloud SQL, Azure Database) and self-hosted instances.

MySQL

Includes MariaDB-compatible deployments and managed MySQL on every major cloud.

Microsoft SQL Server

On-premises SQL Server and Azure SQL. Pairs cleanly with an Active Directory service account or a connection-scoped login.

Oracle

Enterprise Oracle deployments, including the standard listener and service-name configuration.

SAP HANA

SAP HANA Cloud and on-premise. Useful when your team's data of record lives inside SAP.

Presto

Federated query layer over multiple underlying stores. One connection, many backends.

Exasol

For high-performance analytics on Exasol's MPP database.

Jira (as a structured source)

Connect Jira projects as queryable data — issues, fields, transitions — rather than as documents.

JDBC (everything else)

Anything with a published JDBC driver: Snowflake, BigQuery, Redshift, Trino, Vertica, and many more.

If your store isn’t listed by name, JDBC is almost always the answer. Tell us what you’re connecting to and we’ll confirm the right driver to use.

What setting up a connection looks like

You’ll see a single short form. Each field has a clear purpose.

Field	What it’s for
Name	A friendly label your team recognizes — “Production Orders DB,” “Analytics Warehouse.”
Type	The database type you’re connecting to.
Host & port	How VDF AI Data reaches your database over the network.
Database / schema	The specific store this connection should look at.
Credentials	A read-only username and password. Stored encrypted; never displayed back.
Description	Optional. A line about what this connection is for, who owns it, and where to ask if something changes.

Two patterns we recommend from day one:

Names that describe purpose, not infrastructure. “Customer-360 Warehouse” reads better than “redshift-prod-1.” Future you, six months from now, will be grateful.
A short description on every connection. A sentence is enough. “Read-only mirror of our operational CRM, refreshed nightly. Owner: data-platform@” gives a teammate everything they need to act on it.

Testing a connection

After saving, run the Test connection action. VDF AI Data does three things:

Reaches the host on the network.
Authenticates with the credentials you provided.
Confirms the database or schema you named exists and is readable.

If any step fails, the screen tells you exactly which one. Most early failures are network or scope, not credentials — a friendly reminder that your firewall rules and your service-account permissions are the things to check first.

Connection states, in plain language

Every connection moves through a short lifecycle. The status indicator on the connection card tells you where you are.

State	What it means	What to do
Configuring	The connection is being set up; not active yet	Finish filling in the fields and save
Connected	Live and ready; downstream products can read from it	Start using it — discovery, EDA, search, fine-tuning
Needs attention	A test failed — authentication, host, or scope	Update what’s wrong and re-test
Paused	Temporarily disabled by a workspace admin	Resume when you’re ready

Scoping a connection well

The most important decision is what to scope it to. The same principle that applies to connected apps applies twice as much to databases.

Connect a database to a schema, not an entire server. Narrower scope produces sharper answers and tighter security. You can always add a second connection later for another schema.

A few patterns that pay off:

One connection per logical purpose. “Production Orders,” “Marketing Events,” “Support Tickets” — not one mega-connection across everything.
Read-only at the database level. Defense in depth. Even if a downstream product tried to write, the database wouldn’t let it.
Document the owner. Use the Description field for “who owns this database and where to ask if something changes.”
Pair the connection with a refresh cadence. Decide once whether asset inventory refreshes nightly, on-demand, or both — and stick with it.

What you can do once a database is connected

A connected database becomes a first-class source across the rest of VDF AI Data. From the connection’s detail panel, every other capability is one click away:

Discover what’s in it — see the tables, columns, sizes, and freshness without writing a query. See Discovering your data.
Explore the shape of the data — missing values, duplicates, outliers, drift signals. See Exploring your data.
Build feature lists — organize what’s interesting per use case. See Features and relationships.
Index for semantic search — make text-heavy columns searchable by meaning. See Vector indexes and semantic search.
Prepare a fine-tuning dataset — turn your real production data into training pairs. See Fine-tuning datasets.

You don’t have to plan to use all of these on day one. Connect, discover, and decide where to go from what you see.

What stays in your database (and what doesn’t)

This is the question almost everyone asks first. The short answer: your data stays in your database.

VDF AI Data queries your database on demand — it does not copy your tables wholesale.
Vector indexes and fine-tuning datasets are produced from a snapshot read at build time; you decide when to rebuild.
Pausing or removing a connection stops all reads immediately.
The database’s own access control stays in charge — VDF AI Data only ever sees what the connection account can see.

For the full picture of how VDF AI Data handles your data, see Privacy & Security.

Keeping connections healthy

A connection is a living relationship. A few small habits keep yours sharp.

Test on the day the environment was set up

Five minutes of testing right after setup catches the things that break later — firewall rules, password rotation policies, schema drift.

Refresh asset inventory after upstream changes

When your team adds or removes tables, the connection’s view of what’s available won’t update automatically. A manual Refresh inventory action keeps it in sync.

Rotate credentials on a calendar

If your organization rotates database passwords, set a reminder to update the connection at the same time. The connection’s status will move to “needs attention” the moment a rotation lands.

Document ownership

The Description field is small. Use it. “Owned by data-platform@. Source of truth for orders. Escalate via #data-platform on Slack.” That one line saves a teammate forty minutes.

A short troubleshooting list

Symptom	Likely cause	What to try
”Host unreachable” on test	Firewall, VPC, or routing	Check egress allowlist; ask your platform team
”Authentication failed” on test	Wrong credentials or rotated password	Update credentials and re-test
”Database not found” on test	Wrong database/schema name or the account can’t see it	Confirm the name and the account’s grant
Connection works, but no tables appear	Asset inventory hasn’t been run yet	Trigger Refresh inventory from the connection
Worked yesterday, fails today	Password rotation, network change, or the database was paused	Re-test; update credentials if needed

Where to go next

Discovering your data — what shows up after you connect, and how to read it.
Exploring your data — a one-click read on quality, completeness, and drift.
Vector indexes and semantic search — make text columns searchable by meaning.
Connecting sources — the broader picture: files, apps, and databases side-by-side.