folder stricture reorganize

2025-06-25 11:42:12 +03:00
parent 61b31a3089
commit 03fa28a12d
127 changed files with 3108 additions and 1774 deletions
--- a/reports/REAL_MARKET_DATA_POLICY.md
+++ b/reports/REAL_MARKET_DATA_POLICY.md
@@ -0,0 +1,139 @@
+# REAL MARKET DATA POLICY
+
+## CRITICAL REQUIREMENT: ONLY REAL MARKET DATA
+
+This trading system is designed to work EXCLUSIVELY with real market data from cryptocurrency exchanges. **NO SYNTHETIC, GENERATED, OR SIMULATED DATA IS ALLOWED** for training, testing, or inference.
+
+## Policy Statement
+
+### ✅ ALLOWED DATA SOURCES
+- **Binance API**: Real-time and historical OHLCV data
+- **Other Exchange APIs**: Real market data from legitimate exchanges
+- **Cached Real Data**: Previously fetched real market data stored locally
+- **TimescaleDB**: Real market data stored in time-series database
+
+### ❌ PROHIBITED DATA SOURCES
+- Synthetic data generation
+- Random data generation
+- Simulated market conditions
+- Artificial price movements
+- Generated technical indicators
+- Mock data for testing
+
+## Implementation Guidelines
+
+### 1. Data Provider (`core/data_provider.py`)
+- Only fetches data from real exchange APIs
+- Caches real data for performance
+- Never generates or synthesizes data
+- Validates data authenticity
+
+### 2. CNN Training (`models/cnn/scalping_cnn.py`)
+- `ScalpingDataGenerator` only uses real market data
+- Dynamic feature detection from actual market data
+- Training samples generated from real price movements
+- Labels based on actual future price changes
+
+### 3. RL Training (`models/rl/scalping_agent.py`)
+- Environment uses real historical data for backtesting
+- State representations from real market conditions
+- Reward functions based on actual trading outcomes
+- No simulated market scenarios
+
+### 4. Configuration (`config.yaml`)
+```yaml
+training:
+  use_only_real_data: true  # CRITICAL: Never use synthetic/generated data
+```
+
+## Verification Checklist
+
+Before any training or testing session, verify:
+
+- [ ] Data source is a legitimate exchange API
+- [ ] No data generation functions are called
+- [ ] All training samples come from real market history
+- [ ] Cache contains only real market data
+- [ ] No synthetic indicators or features
+
+## Code Examples
+
+### ✅ CORRECT: Using Real Data
+```python
+# Fetch real market data
+df = self.data_provider.get_historical_data(symbol, timeframe, limit=1000, refresh=False)
+
+# Generate training cases from real data
+features, labels = self.data_generator.generate_training_cases(
+    symbol, timeframes, num_samples=10000
+)
+```
+
+## Logging and Monitoring
+
+All data operations must log their source:
+```
+2025-05-24 02:36:16,674 - models.cnn.scalping_cnn - INFO - Generating 10000 training cases for ETH/USDT from REAL market data
+2025-05-24 02:36:17,366 - models.cnn.scalping_cnn - INFO - Loaded 1000 real candles for ETH/USDT 1s
+```
+
+## Testing Guidelines
+
+### Unit Tests
+- Test with small samples of real data
+- Use cached real data for reproducibility
+- Never create mock market data
+
+### Integration Tests
+- Use real API endpoints (with rate limiting)
+- Validate data authenticity
+- Test with multiple timeframes and symbols
+
+### Performance Tests
+- Benchmark with real market data volumes
+- Test memory usage with actual feature counts
+- Validate processing speed with real data complexity
+
+## Emergency Procedures
+
+If synthetic data is accidentally introduced:
+
+1. **STOP** all training immediately
+2. **PURGE** any models trained with synthetic data
+3. **VERIFY** data sources and pipelines
+4. **RETRAIN** from scratch with verified real data
+5. **DOCUMENT** the incident and prevention measures
+
+## Compliance Verification
+
+Regular audits must verify:
+- Data source authenticity
+- Training pipeline integrity
+- Model performance on real data
+- Cache content validation
+
+## Contact and Escalation
+
+Any questions about data authenticity should be escalated immediately. When in doubt, **ALWAYS** choose real market data over convenience.
+
+---
+
+**Remember: The integrity of our trading system depends on using only real market data. No exceptions.**
+
+## ❌ **EXAMPLES OF FORBIDDEN OPERATIONS**
+
+### **Code Patterns to NEVER Use:**
+
+```python
+# ❌ FORBIDDEN EXAMPLES - DO NOT IMPLEMENT
+
+# These patterns are STRICTLY FORBIDDEN:
+# - Any random data generation 
+# - Any synthetic price creation
+# - Any mock trading data
+# - Any simulated market scenarios
+
+# ✅ ONLY ALLOWED: Real market data from exchanges
+real_data = binance_client.get_historical_klines(symbol, interval, limit)
+live_price = binance_client.get_ticker_price(symbol)
+```