Getting Started
1. Sign up
Sign in with GitHub at db.getdot.ai/auth/github. You get a warehouse and an API token instantly.
Or via CLI (no browser needed):
export DOT_DB_TOKEN=$(curl -s -X POST https://db.getdot.ai/api/projects \
-H "Authorization: Bearer $(gh auth token)" | grep -o '"token":"[^"]*"' | cut -d'"' -f4)
2. Run your first query
curl -X POST https://db.getdot.ai/ \
-H "X-GetDot-Key: $DOT_DB_TOKEN" \
-d "SELECT 'hello from FoxDB'"
3. Query the shared dataset (31M rows, free)
curl -X POST https://db.getdot.ai/ \
-H "X-GetDot-Key: $DOT_DB_TOKEN" \
-d "SELECT county, round(avg(price)) AS avg_price, count(*) AS sales
FROM shared.uk_price_paid
WHERE date >= '2024-01-01'
GROUP BY county ORDER BY avg_price DESC LIMIT 5 FORMAT JSON"
Loading Data
From any URL (recommended)
One line of SQL. No staging, no ETL pipeline.
-- CSV from HTTP
CREATE TABLE events AS SELECT * FROM read_csv_auto('https://example.com/events.csv');
-- Parquet from S3
CREATE TABLE logs AS SELECT * FROM read_parquet('s3://bucket/logs/*.parquet');
-- JSON
CREATE TABLE api_data AS SELECT * FROM read_json_auto('https://api.example.com/data.json');
-- Append to existing table
INSERT INTO events SELECT * FROM read_csv_auto('https://example.com/more.csv');
S3 access
SET s3_region='us-east-1';
SET s3_access_key_id='AKIA...';
SET s3_secret_access_key='...';
CREATE TABLE data AS SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');
SQL INSERT
curl -X POST https://db.getdot.ai/ \
-H "X-GetDot-Key: $DOT_DB_TOKEN" \
-d "INSERT INTO events VALUES ('2026-04-01', 'signup', 42)"
PostgreSQL COPY
-- Via psql on port 5432:
\copy my_table FROM 'data.csv' WITH (FORMAT CSV, HEADER)
COPY FROM STDIN is bulk-optimized: rows are buffered and loaded in a single transaction, 100-500x faster than individual INSERTs. Use COPY for any bulk data loading via Airbyte, dlt, or ETL tools.
Keyed Tables — High-Throughput Upserts
FoxDB's storage is append-only Parquet under DuckLake. That's perfect for analytics (immutable history, columnar compression, snapshot isolation), but it makes INSERT ... ON CONFLICT DO UPDATE against a growing table slow — every upsert rewrites a parquet file.
Keyed tables fix this. Declare a primary key and FoxDB transparently routes writes through an append-only raw shadow with a dedup view on top. Writes are O(1) regardless of table size. Reads see "latest value per key." Background compaction folds the raw shadow periodically.
When to use keyed tables
- Use it if your table receives
INSERT ... ON CONFLICT DO UPDATEtraffic — tracking, event-stream, session-state, status-update tables. - Use it if you have multiple concurrent writers hitting the same table (Celery-style worker pools, real-time ingestion).
- Don't use it for INSERT-only logs (append is already fast) or small reference tables (overhead not justified).
- Don't use it if you need to see all historical versions of a row — keyed tables collapse to "latest per key."
Create a new keyed table
-- Single-column key
CREATE TABLE events (
id UUID PRIMARY KEY,
occurred_at TIMESTAMP,
user_id INTEGER,
status VARCHAR,
payload JSON
) WITH (foxdb.engine='replacing');
-- Composite key (any combination of columns)
CREATE TABLE tracking (
message_id VARCHAR,
region VARCHAR,
start_time TIMESTAMP,
status VARCHAR,
PRIMARY KEY (message_id, region, start_time)
) WITH (foxdb.engine='replacing');
Write to it like any Postgres table — the dedup happens transparently:
INSERT INTO events (id, occurred_at, user_id, status)
VALUES ('a1b2', NOW(), 42, 'started')
ON CONFLICT (id) DO UPDATE SET status = EXCLUDED.status;
SELECT count(*) FROM events; -- reads through the dedup view
Convert an existing table (zero-downtime)
If you already have data in a regular table, convert it with a phased migration. Writes and reads keep flowing through the source while the migration runs.
OPTIMIZE TABLE events SET KEY = (id)
WITH (migration='phased', cursor='occurred_at');
-- Watch progress:
SELECT table_name, state FROM _foxdb_keyed_meta;
-- migrating → migrating_dual_write → swapping → active
The cursor column must be present in the table and monotonically orderable (TIMESTAMP, INTEGER, BIGINT, or ISO-8601 VARCHAR). Migration throughput on production hardware: ~7 million rows / minute.
Abort an in-flight migration (pre-cutover only):
ROLLBACK MIGRATION events;
Behavior notes
- Reads: always see the latest value for each key. The dedup view applies
row_number() OVER (PARTITION BY key ORDER BY _seq DESC)and filters tombstones — happens automatically when youSELECT FROM events. - Cutover window: during the brief swap step of phased migrations, a small percentage (~1-2%) of in-flight writes may see a transient binder error. Their write-side fan-out already committed to the raw shadow, so the data is in the dedup view; clients with retry logic see zero net errors.
- DELETE writes a tombstone (the dedup view filters it out).
DELETE FROM events WHERE id = ...works as expected. - UPDATE writes a new raw row with the updated values; the dedup view picks it as latest.
Querying
Output formats
| Format | SQL suffix | Content-Type |
|---|---|---|
| TabSeparated (default) | - | text/tab-separated-values |
| JSON | FORMAT JSON | application/json |
| CSV | FORMAT CSV | text/csv |
| NDJSON | FORMAT JSONEachRow | application/x-ndjson |
Schema discovery
SHOW SCHEMAS;
SHOW TABLES;
SHOW TABLES FROM tpcds;
DESCRIBE my_table;
SELECT * FROM information_schema.columns WHERE table_name = 'my_table';
Query priority
SELECT queries always get priority over writes (CREATE, INSERT, etc.). Writes are limited to half the connection pool so dashboards always get through, even under heavy ETL load.
Economy queue (50% billing)
Run batch jobs, ETL, and backfills at half price. Economy queries wait for idle capacity and use fewer CPU threads, so interactive queries always get priority.
-- Per-query
SET PRIORITY = 'economy';
SELECT * FROM big_table;
-- Per-user (all queries from this user run at economy)
ALTER USER etl SET PRIORITY ECONOMY;
ALTER USER etl SET PRIORITY STANDARD; -- back to normal
-- Check current priority
SHOW PRIORITY;
Economy compute: $0.04/core-hour (50% of standard rate). Best for nightly ETL, batch aggregations, data exports, and backfills.
Manual compaction
DuckLake keeps small Parquet files consolidated on its own. For tables that have accumulated many small files, you can merge them explicitly:
CALL ducklake_merge_adjacent_files('db', 'main.my_table');
Time Travel & Snapshots
Every write creates an immutable snapshot. Query any table as it was at any point in time.
Query historical data
-- By version number
SELECT * FROM my_table AT (VERSION => 5);
-- By timestamp
SELECT * FROM my_table AT (TIMESTAMP => TIMESTAMP '2026-03-15 12:00:00');
SELECT * FROM my_table AT (TIMESTAMP => now() - INTERVAL '1 hour');
Browse snapshots
SHOW SNAPSHOTS;
SHOW SNAPSHOT; -- current version
Change history
See every insert, update, and delete between two snapshot versions.
SHOW HISTORY FOR my_table;
SELECT * FROM db.table_changes('my_table', 10, 20);
SELECT * FROM db.table_insertions('my_table', 10, 20);
SELECT * FROM db.table_deletions('my_table', 10, 20);
Schema evolution
Add, drop, or rename columns without rewriting data files:
ALTER TABLE events ADD COLUMN region VARCHAR;
ALTER TABLE events DROP COLUMN old_field;
ALTER TABLE events RENAME old_name TO new_name;
Table comments
COMMENT ON TABLE events IS 'Application event stream';
COMMENT ON COLUMN events.ts IS 'Event timestamp (UTC)';
Configuration
SHOW OPTIONS;
CALL db.set_option('parquet_compression', 'zstd');
CALL db.set_option('target_file_size', '256MB');
CALL db.set_option('expire_older_than', '7 days'); -- extend snapshot retention
Users & Permissions
Create users
Each user gets a unique API token, shown once. Works for both HTTP and PostgreSQL.
CREATE USER analyst;
CREATE USER dashboard IN ROLE viewer;
CREATE USER reporter DEFAULT_ROLE = analyst; -- Snowflake syntax
Manage users
SHOW USERS;
ALTER USER analyst SET ROLE admin;
ALTER USER analyst RESET PASSWORD; -- generates new token
DROP USER dashboard;
Roles and grants
CREATE ROLE viewer;
CREATE ROLE editor;
GRANT SELECT ON * TO ROLE viewer;
GRANT SELECT, INSERT ON * TO ROLE editor;
GRANT ROLE viewer TO USER dashboard;
REVOKE ROLE editor FROM USER analyst;
SHOW ROLES;
SHOW GRANTS;
Budgets
Set spending limits in USD. Queries return HTTP 429 when exceeded.
SET BUDGET = 10.00; -- tenant-wide
SHOW BUDGET;
ALTER USER analyst SET BUDGET = 5.00; -- per-user
SET BUDGET = 0; -- remove (unlimited)
Monitoring & Limits
Query history
Every query is logged with user, duration, CPU time, and cost.
SELECT * FROM meta.query_history ORDER BY started_at DESC LIMIT 20;
SELECT user_name, count(*) AS queries, sum(duration_ms)/1000 AS total_sec
FROM meta.query_history GROUP BY user_name;
Running queries
View and cancel in-flight queries via the Connection Hub or the API:
GET /api/queries -- list running queries
POST /api/queries/{id}/cancel -- cancel a query
GET /api/queries/recent -- last 20 completed (with cpu_ms, cost)
Limits
| Limit | Value |
|---|---|
| Max request body | 100 MB |
| Query timeout | 1 hour |
| Concurrent queries | 500 per tenant |
| Write concurrency | 16 (half of pool, SELECTs unaffected) |
| Hot storage | 10 TB compressed |
| Cold storage | Unlimited |
Connecting Tools
Two interfaces
| Interface | Port | Use for |
|---|---|---|
| HTTPS API | 443 | AI agents, curl, scripts |
| PostgreSQL | 5432 | Tableau, Metabase, dbt, psql |
Same token, same SQL, same data.
curl
curl -X POST https://db.getdot.ai/ \
-H "X-GetDot-Key: $DOT_DB_TOKEN" \
-d "SELECT 1 FORMAT JSON"
Python
import httpx
r = httpx.post("https://db.getdot.ai/",
headers={"X-GetDot-Key": token},
content="SELECT 1")
print(r.text)
psql
psql "postgresql://default:$DOT_DB_TOKEN@db.getdot.ai:5432/db"
dbt
# ~/.dbt/profiles.yml
my_project:
target: prod
outputs:
prod:
type: postgres
host: db.getdot.ai
port: 5432
user: default
pass: "dot_YOUR_TOKEN"
dbname: db
schema: main
threads: 4
All dbt materializations (view, table, incremental, ephemeral, snapshot) work.
Tableau / Metabase / Grafana
Connector: PostgreSQL
Server: db.getdot.ai
Port: 5432
Database: db
Username: default
Password: <your token>
SSL: Require
Snowflake Migration
Step 1: Export from Snowflake
COPY INTO @my_stage/export/ FROM my_table FILE_FORMAT = (TYPE = PARQUET);
Step 2: Import into FoxDB
CREATE TABLE my_table AS
SELECT * FROM read_parquet('s3://my-stage-bucket/export/*.parquet');
SQL compatibility
| Snowflake | FoxDB (DuckDB) |
|---|---|
FLATTEN() | unnest() |
IFF(cond, a, b) | IF(cond, a, b) |
TRY_CAST(x AS INT) | TRY_CAST(x AS INTEGER) |
DATEDIFF('day', a, b) | date_diff('day', a, b) |
TO_DATE('2026-01-01') | DATE '2026-01-01' |
VARIANT | JSON |
OBJECT_CONSTRUCT() | struct_pack() |
QUALIFY | QUALIFY (same) |
ARRAY_AGG() | array_agg() (same) |
SQL Reference
FoxDB uses standard PostgreSQL SQL. Any SQL you'd write for Postgres runs here unmodified. On top of that, you get DuckDB quality-of-life extensions like GROUP BY ALL, EXCLUDE, QUALIFY, and PIVOT.
Data types
| Type | Description |
|---|---|
INTEGER / BIGINT | 32/64-bit integer |
DOUBLE | 64-bit float |
DECIMAL(p,s) | Fixed precision |
VARCHAR | Variable-length text |
BOOLEAN | true/false |
DATE | Calendar date |
TIMESTAMP | Date + time |
JSON | JSON data |
LIST | Array/list |
STRUCT | Named fields |
MAP | Key-value pairs |
DuckDB bonus syntax
These work in addition to standard PostgreSQL SQL:
-- GROUP BY ALL: auto-groups by non-aggregate columns
SELECT county, avg(price) FROM uk_price_paid GROUP BY ALL;
SELECT * EXCLUDE (postcode1, postcode2) FROM uk_price_paid;
PIVOT uk_price_paid ON type USING avg(price) GROUP BY county;
SELECT *, ROW_NUMBER() OVER (PARTITION BY county ORDER BY price DESC) AS rn
FROM uk_price_paid QUALIFY rn <= 3;
API Reference
Authentication
| Method | Example |
|---|---|
| Header | X-GetDot-Key: dot_prefix_secret |
| Basic Auth | curl --user 'default:token' ... |
| PG password | psql -U default -W |
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | / | Execute SQL query |
| POST | /api/projects | Create project (GitHub PAT) |
| GET | /api/me | Current user info |
| GET | /api/usage | Queries, CPU-seconds, storage |
| GET | /api/queries | Running queries |
| GET | /api/queries/recent | Last 20 queries (with cost) |
| POST | /api/queries/{id}/cancel | Cancel a running query |
| GET | /health | Health check |
Response headers
| Header | Description |
|---|---|
X-Query-Duration-Ms | Wall-clock query time in milliseconds |
X-Rows | Approximate row count |
Errors
| Code | Meaning |
|---|---|
| 400 | SQL syntax error, invalid query, or catalog error |
| 401 | Authentication failed |
| 402 | No credits remaining |
| 403 | Permission denied (RBAC) |
| 413 | Request body exceeds 100 MB |
| 429 | Budget exceeded or queue full |
| 503 | Out of memory or server busy |
| 504 | Query timeout (1 hour) |
by