Dobromir Popov ef71160282 new__training

2025-05-24 02:42:11 +03:00

3.9 KiB

Raw Blame History

REAL MARKET DATA POLICY

CRITICAL REQUIREMENT: ONLY REAL MARKET DATA

This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED for training, testing, or inference.

Policy Statement

✅ ALLOWED DATA SOURCES

Binance API: Real-time and historical OHLCV data
Other Exchange APIs: Real market data from legitimate exchanges
Cached Real Data: Previously fetched real market data stored locally
TimescaleDB: Real market data stored in time-series database

❌ PROHIBITED DATA SOURCES

Synthetic data generation
Random data generation
Simulated market conditions
Artificial price movements
Generated technical indicators
Mock data for testing

Implementation Guidelines

1. Data Provider (`core/data_provider.py`)

Only fetches data from real exchange APIs
Caches real data for performance
Never generates or synthesizes data
Validates data authenticity

2. CNN Training (`models/cnn/scalping_cnn.py`)

ScalpingDataGenerator only uses real market data
Dynamic feature detection from actual market data
Training samples generated from real price movements
Labels based on actual future price changes

3. RL Training (`models/rl/scalping_agent.py`)

Environment uses real historical data for backtesting
State representations from real market conditions
Reward functions based on actual trading outcomes
No simulated market scenarios

4. Configuration (`config.yaml`)

training:
  use_only_real_data: true  # CRITICAL: Never use synthetic/generated data

Verification Checklist

Before any training or testing session, verify:

Data source is a legitimate exchange API
No data generation functions are called
All training samples come from real market history
Cache contains only real market data
No synthetic indicators or features

Code Examples

✅ CORRECT: Using Real Data

# Fetch real market data
df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False)

# Generate training cases from real data
features, labels = self.data_generator.generate_training_cases(
    symbol, timeframes, num_samples=10000
)

❌ INCORRECT: Generating Data

# NEVER DO THIS
synthetic_data = generate_synthetic_market_data()
random_prices = np.random.normal(100, 10, 1000)
simulated_candles = create_fake_ohlcv_data()

Logging and Monitoring

All data operations must log their source:

2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data
2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s

Testing Guidelines

Unit Tests

Test with small samples of real data
Use cached real data for reproducibility
Never create mock market data

Integration Tests

Use real API endpoints (with rate limiting)
Validate data authenticity
Test with multiple timeframes and symbols

Performance Tests

Benchmark with real market data volumes
Test memory usage with actual feature counts
Validate processing speed with real data complexity

Emergency Procedures

If synthetic data is accidentally introduced:

STOP all training immediately
PURGE any models trained with synthetic data
VERIFY data sources and pipelines
RETRAIN from scratch with verified real data
DOCUMENT the incident and prevention measures

Compliance Verification

Regular audits must verify:

Data source authenticity
Training pipeline integrity
Model performance on real data
Cache content validation

Contact and Escalation

Any questions about data authenticity should be escalated immediately. When in doubt, ALWAYS choose real market data over convenience.

Remember: The integrity of our trading system depends on using only real market data. No exceptions.

3.9 KiB Raw Blame History