FoxDB — Docs

Getting Started

1. Sign up

Or via CLI (no browser needed):

export DOT_DB_TOKEN=$(curl -s -X POST https://db.getdot.ai/api/projects \
  -H "Authorization: Bearer $(gh auth token)" | grep -o '"token":"[^"]*"' | cut -d'"' -f4)

2. Run your first query

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "SELECT 'hello from FoxDB'"

3. Query the shared dataset (31M rows, free)

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "SELECT county, round(avg(price)) AS avg_price, count(*) AS sales
      FROM shared.uk_price_paid
      WHERE date >= '2024-01-01'
      GROUP BY county ORDER BY avg_price DESC LIMIT 5 FORMAT JSON"

Loading Data

From any URL (recommended)

One line of SQL. No staging, no ETL pipeline.

-- CSV from HTTP
CREATE TABLE events AS SELECT * FROM read_csv_auto('https://example.com/events.csv');

-- Parquet from S3
CREATE TABLE logs AS SELECT * FROM read_parquet('s3://bucket/logs/*.parquet');

-- JSON
CREATE TABLE api_data AS SELECT * FROM read_json_auto('https://api.example.com/data.json');

-- Append to existing table
INSERT INTO events SELECT * FROM read_csv_auto('https://example.com/more.csv');

S3 access

SET s3_region='us-east-1';
SET s3_access_key_id='AKIA...';
SET s3_secret_access_key='...';
CREATE TABLE data AS SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet');

SQL INSERT

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "INSERT INTO events VALUES ('2026-04-01', 'signup', 42)"

PostgreSQL COPY

-- Via psql on port 5432:
\copy my_table FROM 'data.csv' WITH (FORMAT CSV, HEADER)

COPY FROM STDIN is bulk-optimized: rows are buffered and loaded in a single transaction, 100-500x faster than individual INSERTs. Use COPY for any bulk data loading via Airbyte, dlt, or ETL tools.

Keyed Tables — High-Throughput Upserts

FoxDB's storage is append-only Parquet under DuckLake. That's perfect for analytics (immutable history, columnar compression, snapshot isolation), but it makes INSERT ... ON CONFLICT DO UPDATE against a growing table slow — every upsert rewrites a parquet file.

Keyed tables fix this. Declare a primary key and FoxDB transparently routes writes through an append-only raw shadow with a dedup view on top. Writes are O(1) regardless of table size. Reads see "latest value per key." Background compaction folds the raw shadow periodically.

When to use keyed tables

Use it if your table receives INSERT ... ON CONFLICT DO UPDATE traffic — tracking, event-stream, session-state, status-update tables.
Use it if you have multiple concurrent writers hitting the same table (Celery-style worker pools, real-time ingestion).
Don't use it for INSERT-only logs (append is already fast) or small reference tables (overhead not justified).
Don't use it if you need to see all historical versions of a row — keyed tables collapse to "latest per key."

Create a new keyed table

-- Single-column key
CREATE TABLE events (
    id          UUID PRIMARY KEY,
    occurred_at TIMESTAMP,
    user_id     INTEGER,
    status      VARCHAR,
    payload     JSON
) WITH (foxdb.engine='replacing');

-- Composite key (any combination of columns)
CREATE TABLE tracking (
    message_id  VARCHAR,
    region      VARCHAR,
    start_time  TIMESTAMP,
    status      VARCHAR,
    PRIMARY KEY (message_id, region, start_time)
) WITH (foxdb.engine='replacing');

Write to it like any Postgres table — the dedup happens transparently:

INSERT INTO events (id, occurred_at, user_id, status)
VALUES ('a1b2', NOW(), 42, 'started')
ON CONFLICT (id) DO UPDATE SET status = EXCLUDED.status;

SELECT count(*) FROM events;  -- reads through the dedup view

Convert an existing table (zero-downtime)

If you already have data in a regular table, convert it with a phased migration. Writes and reads keep flowing through the source while the migration runs.

OPTIMIZE TABLE events SET KEY = (id)
  WITH (migration='phased', cursor='occurred_at');

-- Watch progress:
SELECT table_name, state FROM _foxdb_keyed_meta;
-- migrating → migrating_dual_write → swapping → active

The cursor column must be present in the table and monotonically orderable (TIMESTAMP, INTEGER, BIGINT, or ISO-8601 VARCHAR). Migration throughput on production hardware: ~7 million rows / minute.

Abort an in-flight migration (pre-cutover only):

ROLLBACK MIGRATION events;

Behavior notes

Reads: always see the latest value for each key. The dedup view applies row_number() OVER (PARTITION BY key ORDER BY _seq DESC) and filters tombstones — happens automatically when you SELECT FROM events.
Cutover window: during the brief swap step of phased migrations, a small percentage (~1-2%) of in-flight writes may see a transient binder error. Their write-side fan-out already committed to the raw shadow, so the data is in the dedup view; clients with retry logic see zero net errors.
DELETE writes a tombstone (the dedup view filters it out). DELETE FROM events WHERE id = ... works as expected.
UPDATE writes a new raw row with the updated values; the dedup view picks it as latest.

Querying

Output formats

Format	SQL suffix	Content-Type
TabSeparated (default)	`-`	text/tab-separated-values
JSON	`FORMAT JSON`	application/json
CSV	`FORMAT CSV`	text/csv
NDJSON	`FORMAT JSONEachRow`	application/x-ndjson

Schema discovery

SHOW SCHEMAS;
SHOW TABLES;
SHOW TABLES FROM tpcds;
DESCRIBE my_table;
SELECT * FROM information_schema.columns WHERE table_name = 'my_table';

Query priority

SELECT queries always get priority over writes (CREATE, INSERT, etc.). Writes are limited to half the connection pool so dashboards always get through, even under heavy ETL load.

Economy queue (50% billing)

Run batch jobs, ETL, and backfills at half price. Economy queries wait for idle capacity and use fewer CPU threads, so interactive queries always get priority.

-- Per-query
SET PRIORITY = 'economy';
SELECT * FROM big_table;

-- Per-user (all queries from this user run at economy)
ALTER USER etl SET PRIORITY ECONOMY;
ALTER USER etl SET PRIORITY STANDARD;  -- back to normal

-- Check current priority
SHOW PRIORITY;

Economy compute: $0.04/core-hour (50% of standard rate). Best for nightly ETL, batch aggregations, data exports, and backfills.

Manual compaction

DuckLake keeps small Parquet files consolidated on its own. For tables that have accumulated many small files, you can merge them explicitly:

CALL ducklake_merge_adjacent_files('db', 'main.my_table');

Time Travel & Snapshots

Every write creates an immutable snapshot. Query any table as it was at any point in time.

Query historical data

-- By version number
SELECT * FROM my_table AT (VERSION => 5);

-- By timestamp
SELECT * FROM my_table AT (TIMESTAMP => TIMESTAMP '2026-03-15 12:00:00');
SELECT * FROM my_table AT (TIMESTAMP => now() - INTERVAL '1 hour');

Browse snapshots

SHOW SNAPSHOTS;
SHOW SNAPSHOT;    -- current version

Change history

See every insert, update, and delete between two snapshot versions.

SHOW HISTORY FOR my_table;
SELECT * FROM db.table_changes('my_table', 10, 20);
SELECT * FROM db.table_insertions('my_table', 10, 20);
SELECT * FROM db.table_deletions('my_table', 10, 20);

Schema evolution

Add, drop, or rename columns without rewriting data files:

ALTER TABLE events ADD COLUMN region VARCHAR;
ALTER TABLE events DROP COLUMN old_field;
ALTER TABLE events RENAME old_name TO new_name;

Table comments

COMMENT ON TABLE events IS 'Application event stream';
COMMENT ON COLUMN events.ts IS 'Event timestamp (UTC)';

Configuration

SHOW OPTIONS;
CALL db.set_option('parquet_compression', 'zstd');
CALL db.set_option('target_file_size', '256MB');
CALL db.set_option('expire_older_than', '7 days');  -- extend snapshot retention

Users & Permissions

Create users

Each user gets a unique API token, shown once. Works for both HTTP and PostgreSQL.

CREATE USER analyst;
CREATE USER dashboard IN ROLE viewer;
CREATE USER reporter DEFAULT_ROLE = analyst;  -- Snowflake syntax

Manage users

SHOW USERS;
ALTER USER analyst SET ROLE admin;
ALTER USER analyst RESET PASSWORD;  -- generates new token
DROP USER dashboard;

Roles and grants

CREATE ROLE viewer;
CREATE ROLE editor;

GRANT SELECT ON * TO ROLE viewer;
GRANT SELECT, INSERT ON * TO ROLE editor;

GRANT ROLE viewer TO USER dashboard;
REVOKE ROLE editor FROM USER analyst;

SHOW ROLES;
SHOW GRANTS;

Budgets

Set spending limits in USD. Queries return HTTP 429 when exceeded.

SET BUDGET = 10.00;                         -- tenant-wide
SHOW BUDGET;
ALTER USER analyst SET BUDGET = 5.00;       -- per-user
SET BUDGET = 0;                             -- remove (unlimited)

Monitoring & Limits

Query history

Every query is logged with user, duration, CPU time, and cost.

SELECT * FROM meta.query_history ORDER BY started_at DESC LIMIT 20;

SELECT user_name, count(*) AS queries, sum(duration_ms)/1000 AS total_sec
FROM meta.query_history GROUP BY user_name;

Running queries

View and cancel in-flight queries via the Connection Hub or the API:

GET /api/queries              -- list running queries
POST /api/queries/{id}/cancel -- cancel a query
GET /api/queries/recent       -- last 20 completed (with cpu_ms, cost)

Limits

Limit	Value
Max request body	100 MB
Query timeout	1 hour
Concurrent queries	500 per tenant
Write concurrency	16 (half of pool, SELECTs unaffected)
Hot storage	10 TB compressed
Cold storage	Unlimited

Connecting Tools

Two interfaces

Interface	Port	Use for
HTTPS API	443	AI agents, curl, scripts
PostgreSQL	5432	Tableau, Metabase, dbt, psql

Same token, same SQL, same data.

curl

curl -X POST https://db.getdot.ai/ \
  -H "X-GetDot-Key: $DOT_DB_TOKEN" \
  -d "SELECT 1 FORMAT JSON"

Python

import httpx
r = httpx.post("https://db.getdot.ai/",
    headers={"X-GetDot-Key": token},
    content="SELECT 1")
print(r.text)

psql

psql "postgresql://default:$DOT_DB_TOKEN@db.getdot.ai:5432/db"

dbt

# ~/.dbt/profiles.yml
my_project:
  target: prod
  outputs:
    prod:
      type: postgres
      host: db.getdot.ai
      port: 5432
      user: default
      pass: "dot_YOUR_TOKEN"
      dbname: db
      schema: main
      threads: 4

All dbt materializations (view, table, incremental, ephemeral, snapshot) work.

Tableau / Metabase / Grafana

Connector:  PostgreSQL
Server:     db.getdot.ai
Port:       5432
Database:   db
Username:   default
Password:   <your token>
SSL:        Require

Snowflake Migration

Step 1: Export from Snowflake

COPY INTO @my_stage/export/ FROM my_table FILE_FORMAT = (TYPE = PARQUET);

Step 2: Import into FoxDB

CREATE TABLE my_table AS
  SELECT * FROM read_parquet('s3://my-stage-bucket/export/*.parquet');

SQL compatibility

Snowflake	FoxDB (DuckDB)
`FLATTEN()`	`unnest()`
`IFF(cond, a, b)`	`IF(cond, a, b)`
`TRY_CAST(x AS INT)`	`TRY_CAST(x AS INTEGER)`
`DATEDIFF('day', a, b)`	`date_diff('day', a, b)`
`TO_DATE('2026-01-01')`	`DATE '2026-01-01'`
`VARIANT`	`JSON`
`OBJECT_CONSTRUCT()`	`struct_pack()`
`QUALIFY`	`QUALIFY` (same)
`ARRAY_AGG()`	`array_agg()` (same)

SQL Reference

FoxDB uses standard PostgreSQL SQL. Any SQL you'd write for Postgres runs here unmodified. On top of that, you get DuckDB quality-of-life extensions like GROUP BY ALL, EXCLUDE, QUALIFY, and PIVOT.

Data types

Type	Description
`INTEGER` / `BIGINT`	32/64-bit integer
`DOUBLE`	64-bit float
`DECIMAL(p,s)`	Fixed precision
`VARCHAR`	Variable-length text
`BOOLEAN`	true/false
`DATE`	Calendar date
`TIMESTAMP`	Date + time
`JSON`	JSON data
`LIST`	Array/list
`STRUCT`	Named fields
`MAP`	Key-value pairs

DuckDB bonus syntax

These work in addition to standard PostgreSQL SQL:

-- GROUP BY ALL: auto-groups by non-aggregate columns
SELECT county, avg(price) FROM uk_price_paid GROUP BY ALL;
SELECT * EXCLUDE (postcode1, postcode2) FROM uk_price_paid;
PIVOT uk_price_paid ON type USING avg(price) GROUP BY county;
SELECT *, ROW_NUMBER() OVER (PARTITION BY county ORDER BY price DESC) AS rn
FROM uk_price_paid QUALIFY rn <= 3;

API Reference

Authentication

Method	Example
Header	`X-GetDot-Key: dot_prefix_secret`
Basic Auth	`curl --user 'default:token' ...`
PG password	`psql -U default -W`

Endpoints

Method	Path	Description
POST	`/`	Execute SQL query
POST	`/api/projects`	Create project (GitHub PAT)
GET	`/api/me`	Current user info
GET	`/api/usage`	Queries, CPU-seconds, storage
GET	`/api/queries`	Running queries
GET	`/api/queries/recent`	Last 20 queries (with cost)
POST	`/api/queries/{id}/cancel`	Cancel a running query
GET	`/health`	Health check

Response headers

Header	Description
`X-Query-Duration-Ms`	Wall-clock query time in milliseconds
`X-Rows`	Approximate row count

Errors

Code	Meaning
400	SQL syntax error, invalid query, or catalog error
401	Authentication failed
402	No credits remaining
403	Permission denied (RBAC)
413	Request body exceeds 100 MB
429	Budget exceeded or queue full
503	Out of memory or server busy
504	Query timeout (1 hour)