8.7 KiB
Unified Data Storage Setup Guide
Overview
The unified data storage system consolidates all market data storage into a single TimescaleDB backend, replacing fragmented Parquet files, pickle files, and in-memory caches.
Prerequisites
1. PostgreSQL with TimescaleDB
You need PostgreSQL 12+ with TimescaleDB extension installed.
Installation Options
Option A: Docker (Recommended)
docker run -d --name timescaledb \
-p 5432:5432 \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=trading_data \
timescale/timescaledb:latest-pg14
Option B: Local Installation
- Follow TimescaleDB installation guide: https://docs.timescale.com/install/latest/
- Create database:
createdb trading_data
2. Python Dependencies
Ensure you have the required Python packages:
pip install asyncpg
Database Configuration
Update your config.yaml with database connection details:
database:
host: localhost
port: 5432
name: trading_data
user: postgres
password: postgres
pool_size: 20
Setup Process
Step 1: Run Setup Script
python scripts/setup_unified_storage.py
This script will:
- Connect to the database
- Verify TimescaleDB extension
- Create all required tables
- Convert tables to hypertables
- Create indexes for performance
- Set up continuous aggregates
- Configure compression policies
- Configure retention policies
- Verify the setup
- Run basic operation tests
Step 2: Verify Setup
The setup script will display schema information:
=== Schema Information ===
Migrations applied: 8
Tables created: 5
Hypertables: 5
Continuous aggregates: 5
=== Table Sizes ===
ohlcv_data: 8192 bytes
order_book_snapshots: 8192 bytes
order_book_1s_agg: 8192 bytes
order_book_imbalances: 8192 bytes
trade_events: 8192 bytes
=== Hypertables ===
ohlcv_data: 0 chunks, compression=enabled
order_book_snapshots: 0 chunks, compression=enabled
order_book_1s_agg: 0 chunks, compression=enabled
order_book_imbalances: 0 chunks, compression=enabled
trade_events: 0 chunks, compression=enabled
=== Continuous Aggregates ===
ohlcv_1m_continuous: 8192 bytes
ohlcv_5m_continuous: 8192 bytes
ohlcv_15m_continuous: 8192 bytes
ohlcv_1h_continuous: 8192 bytes
ohlcv_1d_continuous: 8192 bytes
Database Schema
Tables
1. ohlcv_data
Stores candlestick data for all timeframes with pre-calculated technical indicators.
Columns:
timestamp(TIMESTAMPTZ): Candle timestampsymbol(VARCHAR): Trading pair (e.g., 'ETH/USDT')timeframe(VARCHAR): Timeframe (1s, 1m, 5m, 15m, 1h, 1d)open_price,high_price,low_price,close_price(DECIMAL): OHLC pricesvolume(DECIMAL): Trading volumetrade_count(INTEGER): Number of trades- Technical indicators:
rsi_14,macd,macd_signal,bb_upper,bb_middle,bb_lower, etc.
Primary Key: (timestamp, symbol, timeframe)
2. order_book_snapshots
Stores raw order book snapshots.
Columns:
timestamp(TIMESTAMPTZ): Snapshot timestampsymbol(VARCHAR): Trading pairexchange(VARCHAR): Exchange namebids(JSONB): Bid levels (top 50)asks(JSONB): Ask levels (top 50)mid_price,spread,bid_volume,ask_volume(DECIMAL): Calculated metrics
Primary Key: (timestamp, symbol, exchange)
3. order_book_1s_agg
Stores 1-second aggregated order book data with $1 price buckets.
Columns:
timestamp(TIMESTAMPTZ): Aggregation timestampsymbol(VARCHAR): Trading pairprice_bucket(DECIMAL): Price bucket ($1 increments)bid_volume,ask_volume(DECIMAL): Aggregated volumesbid_count,ask_count(INTEGER): Number of ordersimbalance(DECIMAL): Order book imbalance
Primary Key: (timestamp, symbol, price_bucket)
4. order_book_imbalances
Stores multi-timeframe order book imbalance metrics.
Columns:
timestamp(TIMESTAMPTZ): Calculation timestampsymbol(VARCHAR): Trading pairimbalance_1s,imbalance_5s,imbalance_15s,imbalance_60s(DECIMAL): Imbalancesvolume_imbalance_1s,volume_imbalance_5s, etc. (DECIMAL): Volume-weighted imbalancesprice_range(DECIMAL): Price range used for calculation
Primary Key: (timestamp, symbol)
5. trade_events
Stores individual trade events.
Columns:
timestamp(TIMESTAMPTZ): Trade timestampsymbol(VARCHAR): Trading pairexchange(VARCHAR): Exchange nameprice(DECIMAL): Trade pricesize(DECIMAL): Trade sizeside(VARCHAR): Trade side ('buy' or 'sell')trade_id(VARCHAR): Unique trade identifier
Primary Key: (timestamp, symbol, exchange, trade_id)
Continuous Aggregates
Continuous aggregates automatically generate higher timeframe data from lower timeframes:
- ohlcv_1m_continuous: 1-minute candles from 1-second data
- ohlcv_5m_continuous: 5-minute candles from 1-minute data
- ohlcv_15m_continuous: 15-minute candles from 5-minute data
- ohlcv_1h_continuous: 1-hour candles from 15-minute data
- ohlcv_1d_continuous: 1-day candles from 1-hour data
Compression Policies
Data is automatically compressed to save storage:
- ohlcv_data: Compress after 7 days
- order_book_snapshots: Compress after 1 day
- order_book_1s_agg: Compress after 2 days
- order_book_imbalances: Compress after 2 days
- trade_events: Compress after 7 days
Expected compression ratio: >80%
Retention Policies
Old data is automatically deleted:
- ohlcv_data: Retain for 2 years
- order_book_snapshots: Retain for 30 days
- order_book_1s_agg: Retain for 60 days
- order_book_imbalances: Retain for 60 days
- trade_events: Retain for 90 days
Performance Optimization
Indexes
All tables have optimized indexes for common query patterns:
- Symbol + timestamp queries
- Timeframe-specific queries
- Exchange-specific queries
- Multi-column composite indexes
Query Performance Targets
- Cache reads: <10ms
- Single timestamp queries: <100ms
- Time range queries (1 hour): <500ms
- Ingestion throughput: >1000 ops/sec
Best Practices
-
Use time_bucket for aggregations:
SELECT time_bucket('1 minute', timestamp) AS bucket, symbol, avg(close_price) AS avg_price FROM ohlcv_data WHERE symbol = 'ETH/USDT' AND timestamp >= NOW() - INTERVAL '1 hour' GROUP BY bucket, symbol; -
Query specific timeframes:
SELECT * FROM ohlcv_data WHERE symbol = 'ETH/USDT' AND timeframe = '1m' AND timestamp >= NOW() - INTERVAL '1 day' ORDER BY timestamp DESC; -
Use continuous aggregates for historical data:
SELECT * FROM ohlcv_1h_continuous WHERE symbol = 'ETH/USDT' AND timestamp >= NOW() - INTERVAL '7 days' ORDER BY timestamp DESC;
Monitoring
Check Database Size
SELECT
hypertable_name,
pg_size_pretty(total_bytes) AS total_size,
pg_size_pretty(compressed_total_bytes) AS compressed_size,
ROUND((1 - compressed_total_bytes::numeric / total_bytes::numeric) * 100, 2) AS compression_ratio
FROM timescaledb_information.hypertables
WHERE hypertable_schema = 'public';
Check Chunk Information
SELECT
hypertable_name,
num_chunks,
num_compressed_chunks,
compression_enabled
FROM timescaledb_information.hypertables
WHERE hypertable_schema = 'public';
Check Continuous Aggregate Status
SELECT
view_name,
materialization_hypertable_name,
pg_size_pretty(total_bytes) AS size
FROM timescaledb_information.continuous_aggregates
WHERE view_schema = 'public';
Troubleshooting
TimescaleDB Extension Not Found
If you see "TimescaleDB extension not found":
- Ensure TimescaleDB is installed
- Connect to database and run:
CREATE EXTENSION timescaledb; - Restart the setup script
Connection Refused
If you see "connection refused":
- Check PostgreSQL is running:
pg_isready - Verify connection details in
config.yaml - Check firewall settings
Permission Denied
If you see "permission denied":
- Ensure database user has CREATE privileges
- Grant privileges:
GRANT ALL PRIVILEGES ON DATABASE trading_data TO postgres;
Slow Queries
If queries are slow:
- Check if indexes exist:
\diin psql - Analyze query plan:
EXPLAIN ANALYZE <your query> - Ensure compression is enabled
- Consider adding more specific indexes
Next Steps
After setup is complete:
- Implement data models (Task 2)
- Implement cache layer (Task 3)
- Implement database connection layer (Task 4)
- Start data migration from Parquet files (Task 7)
Support
For issues or questions:
- Check TimescaleDB docs: https://docs.timescale.com/
- Review PostgreSQL logs:
tail -f /var/log/postgresql/postgresql-*.log - Enable debug logging in setup script