Files
gogo2/STRX_HALO_NPU_GUIDE.md
2025-09-25 00:52:01 +03:00

8.4 KiB

Strix Halo NPU Integration Guide

Overview

This guide explains how to use AMD's Strix Halo NPU (Neural Processing Unit) to accelerate your neural network trading models on Linux. The NPU provides significant performance improvements for inference workloads, especially for CNNs and transformers.

Prerequisites

  • AMD Strix Halo processor
  • Linux kernel 6.11+ (Ubuntu 24.04 LTS recommended)
  • AMD Ryzen AI Software 1.5+
  • ROCm 6.4.1+ (optional, for GPU acceleration)

Quick Start

1. Install NPU Software Stack

# Run the setup script
chmod +x setup_strix_halo_npu.sh
./setup_strix_halo_npu.sh

# Reboot to load NPU drivers
sudo reboot

2. Verify NPU Detection

# Check NPU devices
ls /dev/amdxdna*

# Run NPU test
python3 test_npu.py

3. Test Model Integration

# Run comprehensive integration tests
python3 test_npu_integration.py

Architecture

NPU Acceleration Stack

┌─────────────────────────────────────┐
│           Trading Models            │
│  (CNN, Transformer, RL, DQN)       │
└─────────────┬───────────────────────┘
              │
┌─────────────▼───────────────────────┐
│        Model Interfaces            │
│  (CNNModelInterface, RLAgentInterface) │
└─────────────┬───────────────────────┘
              │
┌─────────────▼───────────────────────┐
│      NPUAcceleratedModel           │
│  (ONNX Runtime + DirectML)          │
└─────────────┬───────────────────────┘
              │
┌─────────────▼───────────────────────┐
│        Strix Halo NPU               │
│      (XDNA Architecture)            │
└─────────────────────────────────────┘

Key Components

  1. NPUDetector: Detects NPU availability and capabilities
  2. ONNXModelWrapper: Wraps ONNX models for NPU inference
  3. PyTorchToONNXConverter: Converts PyTorch models to ONNX
  4. NPUAcceleratedModel: High-level interface for NPU acceleration
  5. Enhanced Model Interfaces: Updated interfaces with NPU support

Usage Examples

Basic NPU Acceleration

from utils.npu_acceleration import NPUAcceleratedModel
import torch.nn as nn

# Create your PyTorch model
model = YourTradingModel()

# Wrap with NPU acceleration
npu_model = NPUAcceleratedModel(
    pytorch_model=model,
    model_name="trading_model",
    input_shape=(60, 50)  # Your input shape
)

# Run inference
import numpy as np
test_data = np.random.randn(1, 60, 50).astype(np.float32)
prediction = npu_model.predict(test_data)

Using Enhanced Model Interfaces

from NN.models.model_interfaces import CNNModelInterface

# Create CNN model interface with NPU support
cnn_interface = CNNModelInterface(
    model=your_cnn_model,
    name="trading_cnn",
    enable_npu=True,
    input_shape=(60, 50)
)

# Get acceleration info
info = cnn_interface.get_acceleration_info()
print(f"NPU available: {info['npu_available']}")

# Make predictions (automatically uses NPU if available)
prediction = cnn_interface.predict(test_data)

Converting Existing Models

from utils.npu_acceleration import PyTorchToONNXConverter

# Convert your existing model
converter = PyTorchToONNXConverter(your_model)
success = converter.convert(
    output_path="models/your_model.onnx",
    input_shape=(60, 50),
    input_names=['trading_features'],
    output_names=['trading_signals']
)

Performance Benefits

Expected Improvements

  • Inference Speed: 3-6x faster than CPU
  • Power Efficiency: Lower power consumption than GPU
  • Latency: Sub-millisecond inference for small models
  • Memory: Efficient memory usage for NPU-optimized models

Benchmarking

from utils.npu_acceleration import benchmark_npu_vs_cpu

# Benchmark your model
results = benchmark_npu_vs_cpu(
    model_path="models/your_model.onnx",
    test_data=your_test_data,
    iterations=100
)

print(f"NPU speedup: {results['speedup']:.2f}x")
print(f"NPU latency: {results['npu_latency_ms']:.2f} ms")

Integration with Existing Code

Orchestrator Integration

The orchestrator automatically detects and uses NPU acceleration when available:

# In core/orchestrator.py
from NN.models.model_interfaces import CNNModelInterface, RLAgentInterface

# Models automatically use NPU if available
cnn_interface = CNNModelInterface(
    model=cnn_model,
    name="trading_cnn",
    enable_npu=True,  # Enable NPU acceleration
    input_shape=(60, 50)
)

Dashboard Integration

The dashboard shows NPU status and performance metrics:

# NPU status is automatically displayed in the dashboard
# Check the "Acceleration" section for NPU information

Troubleshooting

Common Issues

  1. NPU Not Detected

    # Check kernel version (need 6.11+)
    uname -r
    
    # Check NPU devices
    ls /dev/amdxdna*
    
    # Reboot if needed
    sudo reboot
    
  2. ONNX Runtime Issues

    # Reinstall ONNX Runtime with DirectML
    pip install onnxruntime-directml --force-reinstall
    
  3. Model Conversion Failures

    # Check model compatibility
    # Some PyTorch operations may not be supported
    # Use simpler model architectures for NPU
    

Debug Mode

import logging
logging.basicConfig(level=logging.DEBUG)

# Enable detailed NPU logging
from utils.npu_detector import get_npu_info
print(get_npu_info())

Best Practices

Model Optimization

  1. Use ONNX-compatible operations: Avoid custom PyTorch operations
  2. Optimize input shapes: Use fixed input shapes when possible
  3. Batch processing: Process multiple samples together
  4. Model quantization: Consider INT8 quantization for better performance

Memory Management

  1. Monitor NPU memory usage: NPU has limited memory
  2. Use model streaming: Load/unload models as needed
  3. Optimize batch sizes: Balance performance vs memory usage

Error Handling

  1. Always provide fallbacks: NPU may not always be available
  2. Handle conversion errors: Some models may not convert properly
  3. Monitor performance: Ensure NPU is actually faster than CPU

Advanced Configuration

Custom ONNX Providers

from utils.npu_detector import get_onnx_providers

# Get available providers
providers = get_onnx_providers()
print(f"Available providers: {providers}")

# Use specific provider order
custom_providers = ['DmlExecutionProvider', 'CPUExecutionProvider']

Performance Tuning

# Enable ONNX optimizations
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session_options.enable_profiling = True

Monitoring and Metrics

Performance Monitoring

# Get detailed performance info
perf_info = npu_model.get_performance_info()
print(f"Providers: {perf_info['providers']}")
print(f"Input shapes: {perf_info['input_shapes']}")

Dashboard Metrics

The dashboard automatically displays:

  • NPU availability status
  • Inference latency
  • Memory usage
  • Provider information

Future Enhancements

Planned Features

  1. Automatic model optimization: Auto-tune models for NPU
  2. Dynamic provider selection: Choose best provider automatically
  3. Advanced benchmarking: More detailed performance analysis
  4. Model compression: Automatic model size optimization

Contributing

To contribute NPU improvements:

  1. Test with your specific models
  2. Report performance improvements
  3. Suggest optimization techniques
  4. Contribute to the NPU acceleration utilities

Support

For issues with NPU integration:

  1. Check the troubleshooting section
  2. Run the integration tests
  3. Check AMD documentation for latest updates
  4. Verify kernel and driver compatibility

Note: NPU acceleration is most effective for inference workloads. Training is still recommended on GPU or CPU. The NPU excels at real-time trading inference where low latency is critical.