From 3b8016e16067904edc2d34cbc24344134a9f56e9 Mon Sep 17 00:00:00 2001 From: Dobromir Popov Date: Tue, 4 Feb 2025 17:36:33 +0200 Subject: [PATCH] commenets --- crypto/brian/design.md | 67 ++++++++++++++++++++++++++++++++++++++++++ crypto/brian/readme.md | 5 ++++ 2 files changed, 72 insertions(+) create mode 100644 crypto/brian/design.md diff --git a/crypto/brian/design.md b/crypto/brian/design.md new file mode 100644 index 0000000..14b8e90 --- /dev/null +++ b/crypto/brian/design.md @@ -0,0 +1,67 @@ +The proposed architecture effectively addresses synchronizing multiple timeframes and indicators using channel-specific processing and transformer-based attention. Here's how to enhance it further and key considerations: + +Key Enhancements & Considerations: +Timeframe Embeddings: +Why: Helps the model distinguish between different resolutions (1m vs 1h). +How: Add an embedding layer that encodes timeframe IDs (e.g., 0 for 1m, 1 for 5m). Inject this into each channel's processed features. +python + +Run + +Copy +self.timeframe_embed = nn.Embedding(num_timeframes, hidden_dim) +# In forward: +timeframe_ids = torch.arange(num_timeframes).to(device) # During init or passed as input +embeddings = self.timeframe_embed(timeframe_ids).unsqueeze(0) # (1, num_timeframes, hidden_dim) +channels_concat = channels_concat + embeddings # Add to branch outputs +Positional Encodings in Transformer: +Why: Inform the model about the order/hierarchy of timeframes. +How: Apply sinusoidal positional encodings to transformer_input before passing to the encoder. +Dynamic Aggregation: +Why: Mean aggregation may dilute important signals. +How: Use attention pooling to weight channels dynamically: +python + +Run + +Copy +# After transformer_output of shape (num_timeframes, batch_size, hidden_dim) +attn_weights = torch.softmax(self.attn_pool(transformer_output), dim=0) # Learnable linear layer +aggregated_channels = (transformer_output * attn_weights).sum(dim=0) +Multi-Step Prediction: +Why: Predicting multiple future candles requires temporal coherence. +How: Replace the final fc layer with an LSTM or transformer decoder to output sequences. +Meta Feature Interaction: +Why: Allow meta features to modulate channel processing. +How: Use cross-attention between meta features and channel data: +python + +Run + +Copy +# Compute cross-attention where meta_out attends to aggregated_channels +cross_attn = nn.MultiheadAttention(hidden_dim, n_heads) +attn_output, _ = cross_attn(meta_out.unsqueeze(0), aggregated_channels.unsqueeze(0), aggregated_channels.unsqueeze(0)) +combined = torch.cat([attn_output.squeeze(0), meta_out], dim=1) +Data Alignment Strategy: +Preprocessing: Resample higher timeframes to match the lowest resolution's timestamps (e.g., 1m), forward-filling 5m/1h data. This ensures all channels have values at every 1m timestamp. +Example: For a 1h channel, each 1m timestamp in the same hour window shares the same 1h candle values until the next hour. +Loss Function: +Enhancement: Incorporate a custom loss that penalizes prediction errors in peaks/valleys more heavily: +python + +Run + +Copy +def peak_valley_loss(pred_highs, true_highs, pred_lows, true_lows): + high_error = torch.abs(pred_highs - true_highs) * 2 # Emphasize peaks + low_error = torch.abs(pred_lows - true_lows) * 2 # Emphasize valleys + return (high_error + low_error).mean() +Implementation Adjustments: +Transformer Layers: Increase depth (num_layers=2) for richer interactions. +Regularization: Add dropout in channel branches and meta encoder to prevent overfitting. +Input Normalization: Apply instance normalization per channel to handle varying indicator scales. +RL Integration (Future Step): +Use the model's predictions as part of the state representation in an RL agent (e.g., PPO or DQN). +Design a reward function based on trading profitability (e.g., Sharpe ratio, portfolio returns). +Include transaction costs and slippage in the RL environment for realistic backtesting. \ No newline at end of file diff --git a/crypto/brian/readme.md b/crypto/brian/readme.md index fb93545..80afa33 100644 --- a/crypto/brian/readme.md +++ b/crypto/brian/readme.md @@ -35,3 +35,8 @@ we're trying to create a 8b neural network (ai) that will consume live and histo we're stuck, and the code needs fixing The Nn should have one task - to predict next low/high on the short term charts ( 1m, 5m or other - configurable) based on the all past info in parallel from all the different timeframes candles and all the passed indicators candles. It should also have a dedicated NN module todiscover and pay attention to spefic parts in the charts - building and training it's own indicator in a sense. We later use the predicted high/low 5m/1h in the future to buy now and sell later or to short now and close later in the bot. We will have a threshhold of certainty to act, and also do it only if multiple timeframes predictions align. So we may use a transformer module to predict future candles and train that with RL while the candles are rolling until the NN can predict with small loss. existing (running but unfinished ) code: +-------- +implement these suggestions into our code and add arguments for easy switching of modes: +- train (only): pool latest data and use it for backtesting with RL to learn to detect peaks/valleys +- live: load best checkpoint and latest HLOCv data to actively generate trade signals, but calculate and back propagate errors when closing positions. optimize for profit in the reward function +- inference: optimize model loading for inference only - load historical data and periodically append new live data and generate siganls but without active RL \ No newline at end of file