4.2 KiB
4.2 KiB
Prediction Data Optimization Summary
Problem Identified
In the _get_all_predictions
method, data was being fetched redundantly:
- First fetch:
_collect_model_input_data(symbol)
was called to get standardized input data - Second fetch: Each individual prediction method (
_get_rl_prediction
,_get_cnn_predictions
,_get_generic_prediction
) calledbuild_base_data_input(symbol)
again - Third fetch: Some methods like
_get_rl_state
also calledbuild_base_data_input(symbol)
This resulted in the same underlying data (technical indicators, COB data, OHLCV data) being fetched multiple times per prediction cycle.
Solution Implemented
1. Centralized Data Fetching
- Modified
_get_all_predictions
to fetchBaseDataInput
once usingself.data_provider.build_base_data_input(symbol)
- Removed the redundant
_collect_model_input_data
method entirely
2. Updated Method Signatures
All prediction methods now accept an optional base_data
parameter:
_get_rl_prediction(model, symbol, base_data=None)
_get_cnn_predictions(model, symbol, base_data=None)
_get_generic_prediction(model, symbol, base_data=None)
_get_rl_state(symbol, base_data=None)
3. Backward Compatibility
Each method maintains backward compatibility by building BaseDataInput
if base_data
is not provided, ensuring existing code continues to work.
4. Removed Redundant Code
- Eliminated the
_collect_model_input_data
method (60+ lines of redundant code) - Removed duplicate
build_base_data_input
calls within prediction methods - Simplified the data flow architecture
Benefits
Performance Improvements
- Reduced API calls: No more duplicate data fetching per prediction cycle
- Faster inference: Single data fetch instead of 3-4 separate fetches
- Lower latency: Predictions are generated faster due to reduced data overhead
- Memory efficiency: Less temporary data structures created
Code Quality
- DRY principle: Eliminated code duplication
- Cleaner architecture: Single source of truth for model input data
- Maintainability: Easier to modify data fetching logic in one place
- Consistency: All models now use the same data structure
System Reliability
- Consistent data: All models use exactly the same input data
- Reduced race conditions: Single data fetch eliminates timing inconsistencies
- Error handling: Centralized error handling for data fetching
Technical Details
Before Optimization
async def _get_all_predictions(self, symbol: str):
# First data fetch
input_data = await self._collect_model_input_data(symbol)
for model in models:
if isinstance(model, RLAgentInterface):
# Second data fetch inside _get_rl_prediction
rl_prediction = await self._get_rl_prediction(model, symbol)
elif isinstance(model, CNNModelInterface):
# Third data fetch inside _get_cnn_predictions
cnn_predictions = await self._get_cnn_predictions(model, symbol)
After Optimization
async def _get_all_predictions(self, symbol: str):
# Single data fetch for all models
base_data = self.data_provider.build_base_data_input(symbol)
for model in models:
if isinstance(model, RLAgentInterface):
# Pass pre-built data, no additional fetch
rl_prediction = await self._get_rl_prediction(model, symbol, base_data)
elif isinstance(model, CNNModelInterface):
# Pass pre-built data, no additional fetch
cnn_predictions = await self._get_cnn_predictions(model, symbol, base_data)
Testing Results
- ✅ Orchestrator initializes successfully
- ✅ All prediction methods work without errors
- ✅ Generated 3 predictions in test run
- ✅ No performance degradation observed
- ✅ Backward compatibility maintained
Future Considerations
- Consider caching
BaseDataInput
objects for even better performance - Monitor memory usage to ensure the optimization doesn't increase memory footprint
- Add metrics to measure the performance improvement quantitatively
This optimization significantly improves the efficiency of the prediction system while maintaining full functionality and backward compatibility.