Getting Started

1. Sign up

Sign in with GitHub at db.getdot.ai/auth/github. You get a warehouse and an API token instantly.

Or via CLI (no browser needed):

export DOT_DB_TOKEN=$(curl -s -X POST https://db.getdot.ai/api/projects \
  -H "Authorization: Bearer $(gh auth token)" | grep -o '"token":"[^"]*"' | cut -d'"' -f4)

2. Run your first query

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "SELECT 'hello from FoxDB'"

3. Query the shared dataset (31M rows, free)

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "SELECT county, round(avg(price)) AS avg_price, count(*) AS sales
      FROM shared.uk_price_paid
      WHERE date >= '2024-01-01'
      GROUP BY county ORDER BY avg_price DESC LIMIT 5 FORMAT JSON"

Loading Data

From any URL (recommended)

One line of SQL. No staging, no ETL pipeline.

-- CSV from HTTP
CREATE TABLE events AS SELECT * FROM read_csv_auto('https://example.com/events.csv');

-- Parquet from S3
CREATE TABLE logs AS SELECT * FROM read_parquet('s3://bucket/logs/*.parquet');

-- JSON
CREATE TABLE api_data AS SELECT * FROM read_json_auto('https://api.example.com/data.json');

-- Append to existing table
INSERT INTO events SELECT * FROM read_csv_auto('https://example.com/more.csv');

S3 access

SET s3_region='us-east-1';
SET s3_access_key_id='AKIA...';
SET s3_secret_access_key='...';
CREATE TABLE data AS SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');

SQL INSERT

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "INSERT INTO events VALUES ('2026-04-01', 'signup', 42)"

PostgreSQL COPY

-- Via psql on port 5432:
\copy my_table FROM 'data.csv' WITH (FORMAT CSV, HEADER)

COPY FROM STDIN is bulk-optimized: rows are buffered and loaded in a single transaction, 100-500x faster than individual INSERTs. Use COPY for any bulk data loading via Airbyte, dlt, or ETL tools.

Keyed Tables — High-Throughput Upserts

FoxDB's storage is append-only Parquet under DuckLake. That's perfect for analytics (immutable history, columnar compression, snapshot isolation), but it makes INSERT ... ON CONFLICT DO UPDATE against a growing table slow — every upsert rewrites a parquet file.

Keyed tables fix this. Declare a primary key and FoxDB transparently routes writes through an append-only raw shadow with a dedup view on top. Writes are O(1) regardless of table size. Reads see "latest value per key." Background compaction folds the raw shadow periodically.

When to use keyed tables

Create a new keyed table

-- Single-column key
CREATE TABLE events (
    id          UUID PRIMARY KEY,
    occurred_at TIMESTAMP,
    user_id     INTEGER,
    status      VARCHAR,
    payload     JSON
) WITH (foxdb.engine='replacing');

-- Composite key (any combination of columns)
CREATE TABLE tracking (
    message_id  VARCHAR,
    region      VARCHAR,
    start_time  TIMESTAMP,
    status      VARCHAR,
    PRIMARY KEY (message_id, region, start_time)
) WITH (foxdb.engine='replacing');

Write to it like any Postgres table — the dedup happens transparently:

INSERT INTO events (id, occurred_at, user_id, status)
VALUES ('a1b2', NOW(), 42, 'started')
ON CONFLICT (id) DO UPDATE SET status = EXCLUDED.status;

SELECT count(*) FROM events;  -- reads through the dedup view

Convert an existing table (zero-downtime)

If you already have data in a regular table, convert it with a phased migration. Writes and reads keep flowing through the source while the migration runs.

OPTIMIZE TABLE events SET KEY = (id)
  WITH (migration='phased', cursor='occurred_at');

-- Watch progress:
SELECT table_name, state FROM _foxdb_keyed_meta;
-- migrating → migrating_dual_write → swapping → active

The cursor column must be present in the table and monotonically orderable (TIMESTAMP, INTEGER, BIGINT, or ISO-8601 VARCHAR). Migration throughput on production hardware: ~7 million rows / minute.

Abort an in-flight migration (pre-cutover only):

ROLLBACK MIGRATION events;

Behavior notes

Querying

Output formats

FormatSQL suffixContent-Type
TabSeparated (default)-text/tab-separated-values
JSONFORMAT JSONapplication/json
CSVFORMAT CSVtext/csv
NDJSONFORMAT JSONEachRowapplication/x-ndjson

Schema discovery

SHOW SCHEMAS;
SHOW TABLES;
SHOW TABLES FROM tpcds;
DESCRIBE my_table;
SELECT * FROM information_schema.columns WHERE table_name = 'my_table';

Query priority

SELECT queries always get priority over writes (CREATE, INSERT, etc.). Writes are limited to half the connection pool so dashboards always get through, even under heavy ETL load.

Economy queue (50% billing)

Run batch jobs, ETL, and backfills at half price. Economy queries wait for idle capacity and use fewer CPU threads, so interactive queries always get priority.

-- Per-query
SET PRIORITY = 'economy';
SELECT * FROM big_table;

-- Per-user (all queries from this user run at economy)
ALTER USER etl SET PRIORITY ECONOMY;
ALTER USER etl SET PRIORITY STANDARD;  -- back to normal

-- Check current priority
SHOW PRIORITY;

Economy compute: $0.04/core-hour (50% of standard rate). Best for nightly ETL, batch aggregations, data exports, and backfills.

Manual compaction

DuckLake keeps small Parquet files consolidated on its own. For tables that have accumulated many small files, you can merge them explicitly:

CALL ducklake_merge_adjacent_files('db', 'main.my_table');

Time Travel & Snapshots

Every write creates an immutable snapshot. Query any table as it was at any point in time.

Query historical data

-- By version number
SELECT * FROM my_table AT (VERSION => 5);

-- By timestamp
SELECT * FROM my_table AT (TIMESTAMP => TIMESTAMP '2026-03-15 12:00:00');
SELECT * FROM my_table AT (TIMESTAMP => now() - INTERVAL '1 hour');

Browse snapshots

SHOW SNAPSHOTS;
SHOW SNAPSHOT;    -- current version

Change history

See every insert, update, and delete between two snapshot versions.

SHOW HISTORY FOR my_table;
SELECT * FROM db.table_changes('my_table', 10, 20);
SELECT * FROM db.table_insertions('my_table', 10, 20);
SELECT * FROM db.table_deletions('my_table', 10, 20);

Schema evolution

Add, drop, or rename columns without rewriting data files:

ALTER TABLE events ADD COLUMN region VARCHAR;
ALTER TABLE events DROP COLUMN old_field;
ALTER TABLE events RENAME old_name TO new_name;

Table comments

COMMENT ON TABLE events IS 'Application event stream';
COMMENT ON COLUMN events.ts IS 'Event timestamp (UTC)';

Configuration

SHOW OPTIONS;
CALL db.set_option('parquet_compression', 'zstd');
CALL db.set_option('target_file_size', '256MB');
CALL db.set_option('expire_older_than', '7 days');  -- extend snapshot retention

Users & Permissions

Create users

Each user gets a unique API token, shown once. Works for both HTTP and PostgreSQL.

CREATE USER analyst;
CREATE USER dashboard IN ROLE viewer;
CREATE USER reporter DEFAULT_ROLE = analyst;  -- Snowflake syntax

Manage users

SHOW USERS;
ALTER USER analyst SET ROLE admin;
ALTER USER analyst RESET PASSWORD;  -- generates new token
DROP USER dashboard;

Roles and grants

CREATE ROLE viewer;
CREATE ROLE editor;

GRANT SELECT ON * TO ROLE viewer;
GRANT SELECT, INSERT ON * TO ROLE editor;

GRANT ROLE viewer TO USER dashboard;
REVOKE ROLE editor FROM USER analyst;

SHOW ROLES;
SHOW GRANTS;

Budgets

Set spending limits in USD. Queries return HTTP 429 when exceeded.

SET BUDGET = 10.00;                         -- tenant-wide
SHOW BUDGET;
ALTER USER analyst SET BUDGET = 5.00;       -- per-user
SET BUDGET = 0;                             -- remove (unlimited)

Monitoring & Limits

Query history

Every query is logged with user, duration, CPU time, and cost.

SELECT * FROM meta.query_history ORDER BY started_at DESC LIMIT 20;

SELECT user_name, count(*) AS queries, sum(duration_ms)/1000 AS total_sec
FROM meta.query_history GROUP BY user_name;

Running queries

View and cancel in-flight queries via the Connection Hub or the API:

GET /api/queries              -- list running queries
POST /api/queries/{id}/cancel -- cancel a query
GET /api/queries/recent       -- last 20 completed (with cpu_ms, cost)

Limits

LimitValue
Max request body100 MB
Query timeout1 hour
Concurrent queries500 per tenant
Write concurrency16 (half of pool, SELECTs unaffected)
Hot storage10 TB compressed
Cold storageUnlimited

Connecting Tools

Two interfaces

InterfacePortUse for
HTTPS API443AI agents, curl, scripts
PostgreSQL5432Tableau, Metabase, dbt, psql

Same token, same SQL, same data.

curl

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "SELECT 1 FORMAT JSON"

Python

import httpx
r = httpx.post("https://db.getdot.ai/",
    headers={"X-GetDot-Key": token},
    content="SELECT 1")
print(r.text)

psql

psql "postgresql://default:$DOT_DB_TOKEN@db.getdot.ai:5432/db"

dbt

# ~/.dbt/profiles.yml
my_project:
  target: prod
  outputs:
    prod:
      type: postgres
      host: db.getdot.ai
      port: 5432
      user: default
      pass: "dot_YOUR_TOKEN"
      dbname: db
      schema: main
      threads: 4

All dbt materializations (view, table, incremental, ephemeral, snapshot) work.

Tableau / Metabase / Grafana

Connector:  PostgreSQL
Server:     db.getdot.ai
Port:       5432
Database:   db
Username:   default
Password:   <your token>
SSL:        Require

Snowflake Migration

Step 1: Export from Snowflake

COPY INTO @my_stage/export/ FROM my_table FILE_FORMAT = (TYPE = PARQUET);

Step 2: Import into FoxDB

CREATE TABLE my_table AS
  SELECT * FROM read_parquet('s3://my-stage-bucket/export/*.parquet');

SQL compatibility

SnowflakeFoxDB (DuckDB)
FLATTEN()unnest()
IFF(cond, a, b)IF(cond, a, b)
TRY_CAST(x AS INT)TRY_CAST(x AS INTEGER)
DATEDIFF('day', a, b)date_diff('day', a, b)
TO_DATE('2026-01-01')DATE '2026-01-01'
VARIANTJSON
OBJECT_CONSTRUCT()struct_pack()
QUALIFYQUALIFY (same)
ARRAY_AGG()array_agg() (same)

SQL Reference

FoxDB uses standard PostgreSQL SQL. Any SQL you'd write for Postgres runs here unmodified. On top of that, you get DuckDB quality-of-life extensions like GROUP BY ALL, EXCLUDE, QUALIFY, and PIVOT.

Data types

TypeDescription
INTEGER / BIGINT32/64-bit integer
DOUBLE64-bit float
DECIMAL(p,s)Fixed precision
VARCHARVariable-length text
BOOLEANtrue/false
DATECalendar date
TIMESTAMPDate + time
JSONJSON data
LISTArray/list
STRUCTNamed fields
MAPKey-value pairs

DuckDB bonus syntax

These work in addition to standard PostgreSQL SQL:

-- GROUP BY ALL: auto-groups by non-aggregate columns
SELECT county, avg(price) FROM uk_price_paid GROUP BY ALL;
SELECT * EXCLUDE (postcode1, postcode2) FROM uk_price_paid;
PIVOT uk_price_paid ON type USING avg(price) GROUP BY county;
SELECT *, ROW_NUMBER() OVER (PARTITION BY county ORDER BY price DESC) AS rn
FROM uk_price_paid QUALIFY rn <= 3;

API Reference

Authentication

MethodExample
HeaderX-GetDot-Key: dot_prefix_secret
Basic Authcurl --user 'default:token' ...
PG passwordpsql -U default -W

Endpoints

MethodPathDescription
POST/Execute SQL query
POST/api/projectsCreate project (GitHub PAT)
GET/api/meCurrent user info
GET/api/usageQueries, CPU-seconds, storage
GET/api/queriesRunning queries
GET/api/queries/recentLast 20 queries (with cost)
POST/api/queries/{id}/cancelCancel a running query
GET/healthHealth check

Response headers

HeaderDescription
X-Query-Duration-MsWall-clock query time in milliseconds
X-RowsApproximate row count

Errors

CodeMeaning
400SQL syntax error, invalid query, or catalog error
401Authentication failed
402No credits remaining
403Permission denied (RBAC)
413Request body exceeds 100 MB
429Budget exceeded or queue full
503Out of memory or server busy
504Query timeout (1 hour)