Why a database connection is different from a file upload
A file upload is a snapshot. A database connection is a living source. The same row your team edits in your operational system this morning is the row VDF AI sees this afternoon — without anyone having to export, upload, or re-sync.
That live link is what turns VDF AI Data from a document-search layer into a real working layer over your business: customers, orders, products, inventory, telemetry, support tickets, anything that lives in a database. Every other thing the platform does — semantic search, exploratory analysis, feature engineering, fine-tuning preparation — can sit on top of it.
You don't need to be a database administrator to do this. If you have a hostname, a database name, and a read-only account, you have everything you need. The screens guide the rest.
Before you start
A handful of things make the first connection go smoothly. Have them ready:
- A hostname. The address your database listens on. Often something like
db-prod.acme.internalor a cloud-provider endpoint. - A database or schema name. The specific store you want VDF AI Data to look at — not the whole server.
- A read-only username and password. Created just for VDF AI Data, with permission to read the schemas you want surfaced and nothing else.
- Network reachability. From wherever VDF AI Data runs, your database must be reachable. If you use a private network, your platform team may need to allowlist a single egress address.
Always use a dedicated read-only account. Don't reuse the same login your application uses. A separate account makes it easy to audit what VDF AI Data is doing and to revoke access in one click if you ever need to.
The databases you can connect
VDF AI Data ships with first-party support for the most common operational and analytical databases.
PostgreSQL
The most common transactional database. Works with cloud-managed Postgres (RDS, Cloud SQL, Azure Database) and self-hosted instances.
MySQL
Includes MariaDB-compatible deployments and managed MySQL on every major cloud.
Microsoft SQL Server
On-premises SQL Server and Azure SQL. Pairs cleanly with an Active Directory service account or a connection-scoped login.
Oracle
Enterprise Oracle deployments, including the standard listener and service-name configuration.
SAP HANA
SAP HANA Cloud and on-premise. Useful when your team's data of record lives inside SAP.
Presto
Federated query layer over multiple underlying stores. One connection, many backends.
Exasol
For high-performance analytics on Exasol's MPP database.
Jira (as a structured source)
Connect Jira projects as queryable data — issues, fields, transitions — rather than as documents.
JDBC (everything else)
Anything with a published JDBC driver: Snowflake, BigQuery, Redshift, Trino, Vertica, and many more.
If your store isn’t listed by name, JDBC is almost always the answer. Tell us what you’re connecting to and we’ll confirm the right driver to use.
What setting up a connection looks like
You’ll see a single short form. Each field has a clear purpose.
| Field | What it’s for |
|---|---|
| Name | A friendly label your team recognizes — “Production Orders DB,” “Analytics Warehouse.” |
| Type | The database type you’re connecting to. |
| Host & port | How VDF AI Data reaches your database over the network. |
| Database / schema | The specific store this connection should look at. |
| Credentials | A read-only username and password. Stored encrypted; never displayed back. |
| Description | Optional. A line about what this connection is for, who owns it, and where to ask if something changes. |
Two patterns we recommend from day one:
- Names that describe purpose, not infrastructure. “Customer-360 Warehouse” reads better than “redshift-prod-1.” Future you, six months from now, will be grateful.
- A short description on every connection. A sentence is enough. “Read-only mirror of our operational CRM, refreshed nightly. Owner: data-platform@” gives a teammate everything they need to act on it.
Testing a connection
After saving, run the Test connection action. VDF AI Data does three things:
- Reaches the host on the network.
- Authenticates with the credentials you provided.
- Confirms the database or schema you named exists and is readable.
If any step fails, the screen tells you exactly which one. Most early failures are network or scope, not credentials — a friendly reminder that your firewall rules and your service-account permissions are the things to check first.
Connection states, in plain language
Every connection moves through a short lifecycle. The status indicator on the connection card tells you where you are.
| State | What it means | What to do |
|---|---|---|
| Configuring | The connection is being set up; not active yet | Finish filling in the fields and save |
| Connected | Live and ready; downstream products can read from it | Start using it — discovery, EDA, search, fine-tuning |
| Needs attention | A test failed — authentication, host, or scope | Update what’s wrong and re-test |
| Paused | Temporarily disabled by a workspace admin | Resume when you’re ready |
Scoping a connection well
The most important decision is what to scope it to. The same principle that applies to connected apps applies twice as much to databases.
Connect a database to a schema, not an entire server. Narrower scope produces sharper answers and tighter security. You can always add a second connection later for another schema.
A few patterns that pay off:
- One connection per logical purpose. “Production Orders,” “Marketing Events,” “Support Tickets” — not one mega-connection across everything.
- Read-only at the database level. Defense in depth. Even if a downstream product tried to write, the database wouldn’t let it.
- Document the owner. Use the Description field for “who owns this database and where to ask if something changes.”
- Pair the connection with a refresh cadence. Decide once whether asset inventory refreshes nightly, on-demand, or both — and stick with it.
What you can do once a database is connected
A connected database becomes a first-class source across the rest of VDF AI Data. From the connection’s detail panel, every other capability is one click away:
- Discover what’s in it — see the tables, columns, sizes, and freshness without writing a query. See Discovering your data.
- Explore the shape of the data — missing values, duplicates, outliers, drift signals. See Exploring your data.
- Build feature lists — organize what’s interesting per use case. See Features and relationships.
- Index for semantic search — make text-heavy columns searchable by meaning. See Vector indexes and semantic search.
- Prepare a fine-tuning dataset — turn your real production data into training pairs. See Fine-tuning datasets.
You don’t have to plan to use all of these on day one. Connect, discover, and decide where to go from what you see.
What stays in your database (and what doesn’t)
This is the question almost everyone asks first. The short answer: your data stays in your database.
- VDF AI Data queries your database on demand — it does not copy your tables wholesale.
- Vector indexes and fine-tuning datasets are produced from a snapshot read at build time; you decide when to rebuild.
- Pausing or removing a connection stops all reads immediately.
- The database’s own access control stays in charge — VDF AI Data only ever sees what the connection account can see.
For the full picture of how VDF AI Data handles your data, see Privacy & Security.
Keeping connections healthy
A connection is a living relationship. A few small habits keep yours sharp.
Test on the day the environment was set up
Five minutes of testing right after setup catches the things that break later — firewall rules, password rotation policies, schema drift.
Refresh asset inventory after upstream changes
When your team adds or removes tables, the connection’s view of what’s available won’t update automatically. A manual Refresh inventory action keeps it in sync.
Rotate credentials on a calendar
If your organization rotates database passwords, set a reminder to update the connection at the same time. The connection’s status will move to “needs attention” the moment a rotation lands.
Document ownership
The Description field is small. Use it. “Owned by data-platform@. Source of truth for orders. Escalate via #data-platform on Slack.” That one line saves a teammate forty minutes.
A short troubleshooting list
| Symptom | Likely cause | What to try |
|---|---|---|
| ”Host unreachable” on test | Firewall, VPC, or routing | Check egress allowlist; ask your platform team |
| ”Authentication failed” on test | Wrong credentials or rotated password | Update credentials and re-test |
| ”Database not found” on test | Wrong database/schema name or the account can’t see it | Confirm the name and the account’s grant |
| Connection works, but no tables appear | Asset inventory hasn’t been run yet | Trigger Refresh inventory from the connection |
| Worked yesterday, fails today | Password rotation, network change, or the database was paused | Re-test; update credentials if needed |
Where to go next
- Discovering your data — what shows up after you connect, and how to read it.
- Exploring your data — a one-click read on quality, completeness, and drift.
- Vector indexes and semantic search — make text columns searchable by meaning.
- Connecting sources — the broader picture: files, apps, and databases side-by-side.