# FoxDB > Managed analytical database. PostgreSQL-compatible with DuckDB extensions. Two interfaces: HTTPS API + PostgreSQL wire protocol. Works with every PostgreSQL client and tool. Import data from any URL with one line of SQL. FoxDB is an AI-first analytical database. 10GB storage + 10K CPU-seconds free. No credit card required. ## Quick Start ```bash # 1. Get a token (one command, needs GitHub CLI): export DOT_DB_TOKEN=$(curl -s -X POST https://db.getdot.ai/api/projects \ -H "Authorization: Bearer $(gh auth token)" | grep -o '"token":"[^"]*"' | cut -d'"' -f4) # 2. Query: curl -X POST https://db.getdot.ai/ \ -H "X-GetDot-Key: $DOT_DB_TOKEN" \ -d "SELECT 'hello from FoxDB'" # 3. Import data from any URL: curl -X POST https://db.getdot.ai/ \ -H "X-GetDot-Key: $DOT_DB_TOKEN" \ -d "CREATE TABLE my_data AS SELECT * FROM read_csv_auto('https://example.com/data.csv')" ``` No SDK, no driver, no connection pooling needed. ## Two Interfaces ### HTTPS API (port 443) — for AI agents, curl, scripts ``` POST https://db.getdot.ai/ X-GetDot-Key: SELECT count(*) FROM my_table FORMAT JSON ``` ### PostgreSQL wire protocol (port 5432) — for Tableau, Metabase, dbt, psql ``` psql "postgresql://default:@db.getdot.ai:5432/db" ``` Same token, same SQL, same data. Both interfaces access the same database. ## PostgreSQL Compatibility FoxDB is **PostgreSQL-compatible**. If it works with PostgreSQL, it works with FoxDB. Connect with any PostgreSQL client, use standard PostgreSQL SQL, and your existing tools and workflows just work. ### Verified tools | Tool | Status | |------|--------| | psycopg2 / psycopg | Works | | psql | Works | | dbt-postgres | Works (all materializations) | | SQLAlchemy | Works (including `inspect()`) | | JDBC (PostgreSQL) | Works | | Metabase | Works | | Tableau | Works | | Grafana | Works | | DBeaver | Works | | Airbyte | Works | ### What you can expect - Standard DDL: CREATE TABLE, ALTER TABLE, DROP TABLE with all common types (SERIAL, JSONB, UUID, TIMESTAMP, arrays, etc.) - Standard DML: INSERT, UPDATE, DELETE with parameterized queries ($1, $2) - Upserts: INSERT ... ON CONFLICT DO UPDATE / DO NOTHING - Schema introspection: pg_catalog system tables, information_schema views — tools discover your schema automatically - COPY FROM STDIN: Bulk loading via `\copy` in psql or any PG client (100-500x faster than individual INSERTs) - SET/SHOW, BEGIN/COMMIT, CREATE INDEX, VACUUM, GRANT — all accepted ### Known limitations - PRIMARY KEY, UNIQUE, and FOREIGN KEY constraints are accepted but not enforced. Data integrity is the client's responsibility. - Integer division: PostgreSQL `5/2=2`, DuckDB `5/2=2.5`. - ROLLBACK does not actually roll back. Each statement autocommits. ## Authentication 1. **Header auth** (recommended for AI agents): ``` X-GetDot-Key: dot__ ``` 2. **Basic auth** (curl, HTTP clients): ``` curl --user 'default:' https://db.getdot.ai/ -d "SELECT 1" ``` 3. **PostgreSQL password** (Tableau, psql, JDBC): ``` Host: db.getdot.ai Port: 5432 User: default Password: ``` ## Data Loading Six ways to get data in. Pick what matches your source. ### 1. From a URL (CSV / Parquet / JSON) ```sql CREATE TABLE events AS SELECT * FROM read_csv_auto('https://example.com/events.csv'); CREATE TABLE logs AS SELECT * FROM read_parquet('s3://bucket/logs/*.parquet'); CREATE TABLE api AS SELECT * FROM read_json_auto('https://api.example.com/data.json'); INSERT INTO events SELECT * FROM read_csv_auto('https://example.com/more.csv'); ``` Supports `http://`, `https://`, `s3://`, `gs://`, `az://`, `r2://`. Globs work: `read_parquet('s3://bucket/2026/*/*.parquet')`. ### 2. From your own S3 / GCS / Azure bucket Create a session-scoped secret with your keys, then read: ```sql CREATE SECRET my_s3 ( TYPE S3, KEY_ID 'AKIA…', SECRET '…', REGION 'us-east-1', SCOPE 's3://my-bucket/' ); SELECT * FROM read_parquet('s3://my-bucket/logs/*.parquet') LIMIT 10; ``` Secret lives for the session only. GCS uses `TYPE GCS`; Azure uses `TYPE AZURE`. ### 3. Attach a remote Postgres database ```sql ATTACH 'host=mypg.example.com port=5432 user=reader password=… dbname=events connect_timeout=5' AS src (TYPE postgres); -- Query remote schema directly SELECT count(*) FROM src.public.orders; -- Snapshot into FoxDB CREATE TABLE orders AS SELECT * FROM src.public.orders; ``` Any publicly reachable PostgreSQL endpoint works. ### 4. Iceberg and Delta Lake tables ```sql CREATE TABLE orders AS SELECT * FROM iceberg_scan('s3://bucket/warehouse/orders'); CREATE TABLE payments AS SELECT * FROM delta_scan('s3://bucket/payments'); ``` Same credential rules as (1)–(2). For private buckets, `CREATE SECRET` first. ### 5. Bulk load via `COPY FROM STDIN` Fastest path for >10K rows — 100–500× faster than individual INSERTs. ```bash psql "postgresql://default:$DOT_DB_TOKEN@db.getdot.ai:5432/db?sslmode=require" <<'SQL' COPY events(id, user_id, value) FROM STDIN WITH (FORMAT csv); 1,alice,42.5 2,bob,17.3 \. SQL ``` ```python with open('events.csv') as f: cur.copy_expert("COPY events(id, user_id, value) FROM STDIN WITH (FORMAT csv, HEADER true)", f) ``` ### 6. Small `INSERT` via HTTP For streaming events, webhooks, and app writes. ```bash curl -X POST https://db.getdot.ai/ \ -H "X-GetDot-Key: $DOT_DB_TOKEN" \ -d "INSERT INTO events (ts, event, user_id) VALUES ('2026-04-01', 'signup', 42)" ``` Rows are coalesced into one DuckLake snapshot every ~3 seconds or 10K rows. ### From Snowflake / BigQuery / other warehouses No direct reader. Export to Parquet, then use (1): ```sql -- Snowflake side: COPY INTO @my_stage/export/ FROM my_table FILE_FORMAT = (TYPE = PARQUET); -- FoxDB side: CREATE TABLE my_table AS SELECT * FROM read_parquet('s3://my-stage-bucket/export/*.parquet'); ``` BigQuery: `EXPORT DATA OPTIONS(uri='gs://…/part-*.parquet', format='PARQUET')`, then `read_parquet('gs://…')` in FoxDB. ## Keyed tables — high-throughput upserts DuckLake storage is append-only Parquet. That's great for analytics but makes `INSERT ... ON CONFLICT DO UPDATE` against a growing table slow — every upsert rewrites a parquet file. **Keyed tables** fix this by routing writes through an append-only raw shadow with a dedup view on top. Writes are O(1) regardless of table size; reads see "latest value per key." ### When to use keyed tables Use them if your table receives `INSERT ... ON CONFLICT DO UPDATE` traffic, has concurrent writers, or is a tracking / event-stream / session / status table. Don't use them for INSERT-only logs (append is already fast) or tables where you need historical versions of each row. **Heuristic for AI agents generating DDL on FoxDB**: if the table will receive any upsert traffic, declare a primary key and add `WITH (foxdb.engine='replacing')`. Otherwise stay regular. ### New keyed table ```sql -- Single-column key CREATE TABLE events ( id UUID PRIMARY KEY, occurred_at TIMESTAMP, user_id INTEGER, status VARCHAR ) WITH (foxdb.engine='replacing'); -- Composite key (any combination) CREATE TABLE tracking ( message_id VARCHAR, region VARCHAR, start_time TIMESTAMP, status VARCHAR, PRIMARY KEY (message_id, region, start_time) ) WITH (foxdb.engine='replacing'); -- Write as you would with any Postgres table: INSERT INTO events (id, occurred_at, user_id, status) VALUES ('a1b2', NOW(), 42, 'started') ON CONFLICT (id) DO UPDATE SET status = EXCLUDED.status; ``` ### Convert an existing table (zero-downtime phased migration) ```sql OPTIMIZE TABLE events SET KEY = (id) WITH (migration='phased', cursor='occurred_at'); -- Watch progress: SELECT table_name, state FROM _foxdb_keyed_meta; -- migrating → migrating_dual_write → swapping → active -- Abort an in-flight migration (pre-cutover only): ROLLBACK MIGRATION events; ``` Migration throughput on production hardware: ~7 M rows / minute. The cursor column must be monotonically orderable (TIMESTAMP, INTEGER, BIGINT, or ISO-8601 VARCHAR). Both reads and writes keep flowing through the source table during migration via dual-write fan-out. ### Behavior - **Reads**: `SELECT FROM events` transparently returns the latest value per key via a `row_number() OVER (PARTITION BY key ORDER BY _seq DESC)` dedup view; tombstones (deleted rows) filter out. - **DELETE** writes a tombstone — `DELETE FROM events WHERE id = ...` works as expected. - **UPDATE** writes a new raw row with the updated values; dedup view picks the latest. - **Cutover window**: during the brief swap step of a phased migration, ~1-2% of in-flight writes may see a transient binder error. Their raw-side write already committed, so the data is in the dedup view; clients with retry logic see zero net errors. ## Snowflake SQL compatibility For moving data, see Data Loading → "From Snowflake / BigQuery" above. Most Snowflake SQL runs unchanged in FoxDB (DuckDB). Key mappings: - `FLATTEN()` → `unnest()` - `IFF(cond, a, b)` → `IF(cond, a, b)` or `CASE WHEN` - `TRY_CAST(x AS INT)` → `TRY_CAST(x AS INTEGER)` - `ARRAY_AGG()` → `array_agg()` (same) - `QUALIFY` → `QUALIFY` (same — DuckDB supports it) - `OBJECT_CONSTRUCT()` → `struct_pack()` or `{'key': value}` - `DATE_TRUNC('month', col)` → `date_trunc('month', col)` (same) - `DATEDIFF('day', a, b)` → `date_diff('day', a, b)` - `TO_DATE('2026-01-01')` → `DATE '2026-01-01'` or `CAST('2026-01-01' AS DATE)` - `VARIANT` type → `JSON` type ## Query Format - **Input**: SQL in the POST body. Standard PostgreSQL SQL, plus DuckDB extensions. - **Default output**: TabSeparated (TSV) — one row per line, tabs between columns. - **JSON output**: Append `FORMAT JSON` to your SQL query. - **Other formats**: `FORMAT CSV`, `FORMAT JSONEachRow`. ## Connecting Tools ### Python (HTTP) ```python import httpx def query(sql: str, token: str) -> str: resp = httpx.post( "https://db.getdot.ai/", headers={"X-GetDot-Key": token}, content=sql.encode("utf-8"), ) resp.raise_for_status() return resp.text ``` ### Python (PostgreSQL) ```python import psycopg2 conn = psycopg2.connect( host="db.getdot.ai", port=5432, user="default", password="dot_YOUR_TOKEN", dbname="db", sslmode="require", ) cur = conn.cursor() cur.execute("SELECT count(*) FROM my_table") print(cur.fetchone()) ``` ### Node.js ```javascript const resp = await fetch("https://db.getdot.ai/", { method: "POST", headers: { "X-GetDot-Key": process.env.DOT_DB_TOKEN }, body: "SELECT 1 FORMAT JSON", }); const data = await resp.json(); ``` ### dbt ```yaml # ~/.dbt/profiles.yml my_project: target: prod outputs: prod: type: postgres host: db.getdot.ai port: 5432 user: default pass: "dot_YOUR_TOKEN" dbname: db schema: main threads: 4 ``` ### Tableau / Metabase / Grafana ``` Connector: PostgreSQL Server: db.getdot.ai Port: 5432 Database: db Username: default Password: SSL: Require ``` ### psql ```bash psql "postgresql://default:dot_YOUR_TOKEN@db.getdot.ai:5432/db?sslmode=require" ``` ## Shared Datasets Every account includes free access to shared analytical datasets: ### UK Property Prices (31M rows, 1995-2026) ```sql SELECT county, round(avg(price)) AS avg_price, count(*) AS sales FROM shared.uk_price_paid WHERE date >= '2024-01-01' GROUP BY county ORDER BY avg_price DESC LIMIT 10 ``` Columns: price, date, postcode1, postcode2, type (terraced/semi-detached/detached/flat/other), is_new, duration (freehold/leasehold), addr1, addr2, street, locality, town, district, county. ## Access Control (RBAC) ### Users Each user gets a unique API token (shown once, like a GitHub PAT). Same token works for HTTP and PostgreSQL. ```sql CREATE USER analyst; -- generates token CREATE USER dashboard IN ROLE viewer; -- with role assignment CREATE USER reporter DEFAULT_ROLE = analyst; -- Snowflake syntax CREATE USER migrated WITH PASSWORD 'custom'; -- specify exact token ALTER USER analyst SET ROLE admin; -- change role ALTER USER analyst RESET PASSWORD; -- regenerate token DROP USER dashboard; -- revoke token SHOW USERS; -- list all users ``` ### Roles and Grants ```sql CREATE ROLE viewer; CREATE ROLE editor; GRANT SELECT ON my_table TO ROLE viewer; GRANT SELECT ON * TO ROLE viewer; -- all tables GRANT ALL ON * TO ROLE admin; GRANT ROLE viewer TO USER dashboard; REVOKE ROLE viewer FROM USER dashboard; SHOW ROLES; SHOW GRANTS; ``` Token permission levels: `admin` (full access) or `readonly` (SELECT only). ### Per-User Priority ```sql ALTER USER etl SET PRIORITY ECONOMY; -- all queries at 50% billing rate ALTER USER etl SET PRIORITY STANDARD; -- back to normal ``` ### Budgets Set spending limits in USD. Queries are refused (HTTP 429) when exceeded. ```sql -- Tenant-wide budget SET BUDGET = 100.00; -- $100 USD limit SHOW BUDGET; -- shows budget/spent/remaining -- Per-user budget ALTER USER analyst SET BUDGET = 10.00; -- $10 for this user ALTER USER dashboard SET BUDGET = 2.00; -- $2 for dashboard -- Remove budget SET BUDGET = 0; -- unlimited ALTER USER analyst SET BUDGET = 0; -- unlimited for user ``` Per-second billing at $0.08/core-hour. Billed on actual CPU core-time consumed, not wall-clock. ### Query History Every query is logged and queryable via SQL: ```sql SELECT * FROM meta.query_history ORDER BY started_at DESC LIMIT 10; -- Slowest queries today SELECT sql, duration_ms FROM meta.query_history WHERE started_at >= current_date AND status = 'ok' ORDER BY duration_ms DESC LIMIT 5; -- Usage by user SELECT user_name, count(*) AS queries, sum(duration_ms)/1000 AS cpu_seconds FROM meta.query_history GROUP BY user_name; -- Materialize into your own table CREATE TABLE my_query_analysis AS SELECT * FROM meta.query_history; ``` ## AI Agent Integration ### Claude Code Add to your project's `CLAUDE.md`: ```markdown ## Data Warehouse This project uses FoxDB (https://db.getdot.ai) for analytics. ### Querying curl -X POST https://db.getdot.ai/ \ -H "X-GetDot-Key: $DOT_DB_TOKEN" \ -d "YOUR SQL HERE" - SQL dialect: PostgreSQL SQL with DuckDB extensions - Default format: TabSeparated. Append FORMAT JSON for JSON. - Import data: CREATE TABLE x AS SELECT * FROM read_csv_auto('url') ``` ### MCP Tool Definition ```json { "name": "query_dot_db", "description": "Run SQL against FoxDB. Returns TSV by default. Append 'FORMAT JSON' for JSON. Import data with: CREATE TABLE x AS SELECT * FROM read_csv_auto('url')", "input_schema": { "type": "object", "properties": { "sql": { "type": "string", "description": "SQL query to execute" } }, "required": ["sql"] } } ``` Implementation: POST the `sql` field to `https://db.getdot.ai/` with header `X-GetDot-Key: `. ### OpenAI Function Calling ```json { "type": "function", "function": { "name": "query_dot_db", "description": "Run SQL against FoxDB. Import data with read_csv_auto('url'). Returns TSV default, append FORMAT JSON for JSON.", "parameters": { "type": "object", "properties": { "sql": { "type": "string", "description": "SQL query" } }, "required": ["sql"] } } } ``` ## SQL Dialect Standard PostgreSQL SQL, plus DuckDB quality-of-life extensions: - **Aggregation**: `count(*)`, `sum()`, `avg()`, `approx_count_distinct()`, `quantile_cont(0.95)` - **Date functions**: `date_trunc('month', col)`, `date_part('year', col)`, `age()`, `current_date` - **Arrays & lists**: `list_value(1,2,3)`, `unnest()`, `array_agg()` - **Structs & maps**: `struct_pack(a := 1)`, `map {'key': 'value'}`, nested types - **String**: `LIKE`, `ILIKE`, `regexp_matches()`, `regexp_replace()`, `string_split()` - **Window functions**: `ROW_NUMBER()`, `RANK()`, `LAG()`, `LEAD()` with `OVER()` - **Import**: `read_csv_auto()`, `read_parquet()`, `read_json_auto()` from any URL - **DuckDB extras**: `GROUP BY ALL`, `SELECT * EXCLUDE (col)`, `PIVOT`, `UNPIVOT`, `QUALIFY` ## Time Travel & Snapshots Every write creates an immutable snapshot. Query any table as it was at any point in time. ```sql -- Query historical data by version or timestamp SELECT * FROM my_table AT (VERSION => 5); SELECT * FROM my_table AT (TIMESTAMP => now() - INTERVAL '1 hour'); -- Browse snapshots SHOW SNAPSHOTS; SHOW SNAPSHOT; -- current version -- Change history (inserts, updates, deletes) SHOW HISTORY FOR my_table; SELECT * FROM db.table_changes('my_table', 10, 20); -- between versions -- Schema evolution (metadata-only, no data rewrite) ALTER TABLE events ADD COLUMN region VARCHAR; ALTER TABLE events DROP COLUMN old_field; ALTER TABLE events RENAME old_name TO new_name; -- Table/column comments COMMENT ON TABLE events IS 'Application event stream'; COMMENT ON COLUMN events.ts IS 'Event timestamp (UTC)'; -- Configuration SHOW OPTIONS; CALL db.set_option('expire_older_than', '7 days'); -- extend retention (default: 24h) CALL db.set_option('parquet_compression', 'zstd'); ``` Default snapshot retention is 24 hours. Extending retention increases storage usage (billed at hot storage rate). Use `CALL ducklake_merge_adjacent_files(...)` to merge small Parquet files for a table (see Manual Compaction below). ## Economy Queue (50% billing) Run batch jobs, ETL, and background work at half price. Economy queries execute when the server is idle, using fewer threads so they don't interfere with real-time queries. ```sql -- Per-query: prefix with SET PRIORITY SET PRIORITY = 'economy'; SELECT * FROM big_table; -- Per-user: all queries from this user run at economy priority ALTER USER etl SET PRIORITY ECONOMY; -- Check priority SHOW PRIORITY; ``` Economy queries are billed at $0.04/core-hour (50% of standard rate). They wait for idle capacity and run with reduced CPU threads, so interactive queries always get priority. Best for: nightly ETL, batch aggregations, data exports, backfills. ## Manual Compaction DuckLake keeps small Parquet files consolidated on its own. For tables that have accumulated many small files, merge them explicitly: ```sql CALL ducklake_merge_adjacent_files('db', 'main.my_table'); ``` ## Pricing | Component | Price | |-----------|-------| | Compute | $0.08 / core-hour (actual CPU core-time, not wall-clock) | | Economy compute | $0.04 / core-hour (batch jobs, ETL — runs when idle) | | Hot storage | $80 / TB / month | | Cold storage (S3) | $8 / TB / month | | Ingestion & egress | Free | Free tier: 10GB storage + 10K CPU-seconds included. No credit card required. Per-second billing granularity. You pay for actual CPU time per query, not wall-clock time. No idle warehouse costs. ## Infrastructure - **Encryption**: TLS 1.3 in transit, AES-256 at rest - **Storage**: Open Parquet format, no vendor lock-in - **Isolation**: Per-tenant database with dedicated storage and resource limits - **Backups**: Nightly off-site backups - **Energy**: 100% renewable energy - **Compliance**: GDPR compliant, EU data residency ## Links - Web: https://db.getdot.ai - Docs: https://db.getdot.ai/llms.txt - Status: https://db.getdot.ai/health