Merge commit 'd49a473ed6f4aef55bfdd47d6370e53582be6b7b' into cleanup

2025-10-01 00:32:19 +03:00
parent a03b9c5701 d49a473ed6
commit 388334e4a8
353 changed files with 81004 additions and 35899 deletions
--- a/_dev/dev_notes.md
+++ b/_dev/dev_notes.md
@@ -84,7 +84,43 @@ use existing checkpoint manager if it;s not too bloated as well. otherwise re-im
 we should load the models in a way that we do a back propagation and other model specificic training at realtime as training examples emerge from the realtime data we process. we will save only the best examples (the realtime data dumps we feed to the models) so we can cold start other models if we change the architecture. if it's not working, perform a cleanup of all traininn and trainer code to make it easer to work withm to streamline latest changes and to  simplify and refactor it


+<<<<<<< HEAD
 let's also work on the transformer model - we will add a candlestick tokenizer  that will use 8 dimentional vectors to represent candlesticks: 5 dim for OHLCV data, 1 for the timestamp, timeframe and symbol



+=======
+
+also, adjust our bybit api so we trade with usdt futures - where we can have up to 50x leverage. on spots we can have 10x max
+
+
+
+
+
+--------------
+
+
+
+
+1. on the dash buy/sell buttons do not open/close positions in live mode .
+2. we also need to fix our Current Order Book data shown on the dash - it is not consistent ande definitely not fast/low latency. let's store all COB data aggregated to 1S buckets and 0.2s sec ticks. show COB datasource updte rate
+3. we don't calculate the COB imbalance correctly - we have MA with  4 time windows. 
+4. we have some more work on the models statistics and overview but we can focust there later when we fix the other issues
+
+5. audit and backtest if calculate_williams_pivot_points works correctly. show pivot points on the dash on the 1m candlesticks
+
+
+
+can we enhance our RL reward/punish to promote closing loosing trades and keep winning ones taking into account the predicted price direction and conviction. For example the more loosing a open position is the more we should be biased to closing it. but if the models predict with high certainty that there will be a big move up we will be more tolerant to a drawdown. and the opposite - we should be inclined to close winning trades but keep them as long as the price goes up and we  project more upside. Do you think there is a smart way to implement that in the current RL and other training pipelines?
+I want it more to be a part of a proper rewardfunction bias rather than a algorithmic calculation on the post signal processing as I prefer that this is a behaviour the moedl learns and is adapted to the current condition without hard bowndaries.
+THINK REALY HARD  
+
+
+do we evaluate and reward/punish each model at each reference?
+
+
+
+
+in our realtime Reinforcement learning  training how do we calculate the score (reward/penalty?) 
+Let's use the mean squared difference between the prediction and the empirical outcome. We should do a training run at each inference which will use the last inference's prediction and the current price as outcome. do that up to  6 last predictions and calculating accuracity separately to have a better picture of the ability to predict couple of timeframes in the future. additionally to the frequent inference every 1 or 5s (i forgot the curent CNN rate) do an inference at each new timeframe interval. model should get the full data (multi timeframe - ETH (main) 1s 1m 1h 1d and 1m for BTC, SPX and one more) but should also know on what timeframe it is predicting. we predict only on the main symbol - so in 4 timeframes. bur on every hour we will do 4 inferences - one for each timeframe
+>>>>>>> d49a473ed6f4aef55bfdd47d6370e53582be6b7b