using LLM for sentiment analysis
This commit is contained in:
323
STRX_HALO_NPU_GUIDE.md
Normal file
323
STRX_HALO_NPU_GUIDE.md
Normal file
@@ -0,0 +1,323 @@
|
||||
# Strix Halo NPU Integration Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide explains how to use AMD's Strix Halo NPU (Neural Processing Unit) to accelerate your neural network trading models on Linux. The NPU provides significant performance improvements for inference workloads, especially for CNNs and transformers.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- AMD Strix Halo processor
|
||||
- Linux kernel 6.11+ (Ubuntu 24.04 LTS recommended)
|
||||
- AMD Ryzen AI Software 1.5+
|
||||
- ROCm 6.4.1+ (optional, for GPU acceleration)
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Install NPU Software Stack
|
||||
|
||||
```bash
|
||||
# Run the setup script
|
||||
chmod +x setup_strix_halo_npu.sh
|
||||
./setup_strix_halo_npu.sh
|
||||
|
||||
# Reboot to load NPU drivers
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
### 2. Verify NPU Detection
|
||||
|
||||
```bash
|
||||
# Check NPU devices
|
||||
ls /dev/amdxdna*
|
||||
|
||||
# Run NPU test
|
||||
python3 test_npu.py
|
||||
```
|
||||
|
||||
### 3. Test Model Integration
|
||||
|
||||
```bash
|
||||
# Run comprehensive integration tests
|
||||
python3 test_npu_integration.py
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### NPU Acceleration Stack
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ Trading Models │
|
||||
│ (CNN, Transformer, RL, DQN) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ Model Interfaces │
|
||||
│ (CNNModelInterface, RLAgentInterface) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ NPUAcceleratedModel │
|
||||
│ (ONNX Runtime + DirectML) │
|
||||
└─────────────┬───────────────────────┘
|
||||
│
|
||||
┌─────────────▼───────────────────────┐
|
||||
│ Strix Halo NPU │
|
||||
│ (XDNA Architecture) │
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Components
|
||||
|
||||
1. **NPUDetector**: Detects NPU availability and capabilities
|
||||
2. **ONNXModelWrapper**: Wraps ONNX models for NPU inference
|
||||
3. **PyTorchToONNXConverter**: Converts PyTorch models to ONNX
|
||||
4. **NPUAcceleratedModel**: High-level interface for NPU acceleration
|
||||
5. **Enhanced Model Interfaces**: Updated interfaces with NPU support
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic NPU Acceleration
|
||||
|
||||
```python
|
||||
from utils.npu_acceleration import NPUAcceleratedModel
|
||||
import torch.nn as nn
|
||||
|
||||
# Create your PyTorch model
|
||||
model = YourTradingModel()
|
||||
|
||||
# Wrap with NPU acceleration
|
||||
npu_model = NPUAcceleratedModel(
|
||||
pytorch_model=model,
|
||||
model_name="trading_model",
|
||||
input_shape=(60, 50) # Your input shape
|
||||
)
|
||||
|
||||
# Run inference
|
||||
import numpy as np
|
||||
test_data = np.random.randn(1, 60, 50).astype(np.float32)
|
||||
prediction = npu_model.predict(test_data)
|
||||
```
|
||||
|
||||
### Using Enhanced Model Interfaces
|
||||
|
||||
```python
|
||||
from NN.models.model_interfaces import CNNModelInterface
|
||||
|
||||
# Create CNN model interface with NPU support
|
||||
cnn_interface = CNNModelInterface(
|
||||
model=your_cnn_model,
|
||||
name="trading_cnn",
|
||||
enable_npu=True,
|
||||
input_shape=(60, 50)
|
||||
)
|
||||
|
||||
# Get acceleration info
|
||||
info = cnn_interface.get_acceleration_info()
|
||||
print(f"NPU available: {info['npu_available']}")
|
||||
|
||||
# Make predictions (automatically uses NPU if available)
|
||||
prediction = cnn_interface.predict(test_data)
|
||||
```
|
||||
|
||||
### Converting Existing Models
|
||||
|
||||
```python
|
||||
from utils.npu_acceleration import PyTorchToONNXConverter
|
||||
|
||||
# Convert your existing model
|
||||
converter = PyTorchToONNXConverter(your_model)
|
||||
success = converter.convert(
|
||||
output_path="models/your_model.onnx",
|
||||
input_shape=(60, 50),
|
||||
input_names=['trading_features'],
|
||||
output_names=['trading_signals']
|
||||
)
|
||||
```
|
||||
|
||||
## Performance Benefits
|
||||
|
||||
### Expected Improvements
|
||||
|
||||
- **Inference Speed**: 3-6x faster than CPU
|
||||
- **Power Efficiency**: Lower power consumption than GPU
|
||||
- **Latency**: Sub-millisecond inference for small models
|
||||
- **Memory**: Efficient memory usage for NPU-optimized models
|
||||
|
||||
### Benchmarking
|
||||
|
||||
```python
|
||||
from utils.npu_acceleration import benchmark_npu_vs_cpu
|
||||
|
||||
# Benchmark your model
|
||||
results = benchmark_npu_vs_cpu(
|
||||
model_path="models/your_model.onnx",
|
||||
test_data=your_test_data,
|
||||
iterations=100
|
||||
)
|
||||
|
||||
print(f"NPU speedup: {results['speedup']:.2f}x")
|
||||
print(f"NPU latency: {results['npu_latency_ms']:.2f} ms")
|
||||
```
|
||||
|
||||
## Integration with Existing Code
|
||||
|
||||
### Orchestrator Integration
|
||||
|
||||
The orchestrator automatically detects and uses NPU acceleration when available:
|
||||
|
||||
```python
|
||||
# In core/orchestrator.py
|
||||
from NN.models.model_interfaces import CNNModelInterface, RLAgentInterface
|
||||
|
||||
# Models automatically use NPU if available
|
||||
cnn_interface = CNNModelInterface(
|
||||
model=cnn_model,
|
||||
name="trading_cnn",
|
||||
enable_npu=True, # Enable NPU acceleration
|
||||
input_shape=(60, 50)
|
||||
)
|
||||
```
|
||||
|
||||
### Dashboard Integration
|
||||
|
||||
The dashboard shows NPU status and performance metrics:
|
||||
|
||||
```python
|
||||
# NPU status is automatically displayed in the dashboard
|
||||
# Check the "Acceleration" section for NPU information
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **NPU Not Detected**
|
||||
```bash
|
||||
# Check kernel version (need 6.11+)
|
||||
uname -r
|
||||
|
||||
# Check NPU devices
|
||||
ls /dev/amdxdna*
|
||||
|
||||
# Reboot if needed
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
2. **ONNX Runtime Issues**
|
||||
```bash
|
||||
# Reinstall ONNX Runtime with DirectML
|
||||
pip install onnxruntime-directml --force-reinstall
|
||||
```
|
||||
|
||||
3. **Model Conversion Failures**
|
||||
```python
|
||||
# Check model compatibility
|
||||
# Some PyTorch operations may not be supported
|
||||
# Use simpler model architectures for NPU
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
|
||||
```python
|
||||
import logging
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
# Enable detailed NPU logging
|
||||
from utils.npu_detector import get_npu_info
|
||||
print(get_npu_info())
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Model Optimization
|
||||
|
||||
1. **Use ONNX-compatible operations**: Avoid custom PyTorch operations
|
||||
2. **Optimize input shapes**: Use fixed input shapes when possible
|
||||
3. **Batch processing**: Process multiple samples together
|
||||
4. **Model quantization**: Consider INT8 quantization for better performance
|
||||
|
||||
### Memory Management
|
||||
|
||||
1. **Monitor NPU memory usage**: NPU has limited memory
|
||||
2. **Use model streaming**: Load/unload models as needed
|
||||
3. **Optimize batch sizes**: Balance performance vs memory usage
|
||||
|
||||
### Error Handling
|
||||
|
||||
1. **Always provide fallbacks**: NPU may not always be available
|
||||
2. **Handle conversion errors**: Some models may not convert properly
|
||||
3. **Monitor performance**: Ensure NPU is actually faster than CPU
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Custom ONNX Providers
|
||||
|
||||
```python
|
||||
from utils.npu_detector import get_onnx_providers
|
||||
|
||||
# Get available providers
|
||||
providers = get_onnx_providers()
|
||||
print(f"Available providers: {providers}")
|
||||
|
||||
# Use specific provider order
|
||||
custom_providers = ['DmlExecutionProvider', 'CPUExecutionProvider']
|
||||
```
|
||||
|
||||
### Performance Tuning
|
||||
|
||||
```python
|
||||
# Enable ONNX optimizations
|
||||
session_options = ort.SessionOptions()
|
||||
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
|
||||
session_options.enable_profiling = True
|
||||
```
|
||||
|
||||
## Monitoring and Metrics
|
||||
|
||||
### Performance Monitoring
|
||||
|
||||
```python
|
||||
# Get detailed performance info
|
||||
perf_info = npu_model.get_performance_info()
|
||||
print(f"Providers: {perf_info['providers']}")
|
||||
print(f"Input shapes: {perf_info['input_shapes']}")
|
||||
```
|
||||
|
||||
### Dashboard Metrics
|
||||
|
||||
The dashboard automatically displays:
|
||||
- NPU availability status
|
||||
- Inference latency
|
||||
- Memory usage
|
||||
- Provider information
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
|
||||
1. **Automatic model optimization**: Auto-tune models for NPU
|
||||
2. **Dynamic provider selection**: Choose best provider automatically
|
||||
3. **Advanced benchmarking**: More detailed performance analysis
|
||||
4. **Model compression**: Automatic model size optimization
|
||||
|
||||
### Contributing
|
||||
|
||||
To contribute NPU improvements:
|
||||
1. Test with your specific models
|
||||
2. Report performance improvements
|
||||
3. Suggest optimization techniques
|
||||
4. Contribute to the NPU acceleration utilities
|
||||
|
||||
## Support
|
||||
|
||||
For issues with NPU integration:
|
||||
1. Check the troubleshooting section
|
||||
2. Run the integration tests
|
||||
3. Check AMD documentation for latest updates
|
||||
4. Verify kernel and driver compatibility
|
||||
|
||||
---
|
||||
|
||||
**Note**: NPU acceleration is most effective for inference workloads. Training is still recommended on GPU or CPU. The NPU excels at real-time trading inference where low latency is critical.
|
||||
|
Reference in New Issue
Block a user