anotate ui phase 1

2025-10-18 16:37:13 +03:00
parent d136f9d79c
commit bc7095308a
26 changed files with 3616 additions and 80 deletions
--- a/_dev/dev_notes.md
+++ b/_dev/dev_notes.md
@@ -23,7 +23,6 @@ fix the dash. it still flickers every 10 seconds for a second. update the chart



-
 >> Training

 how effective is our training? show current loss and accuracy on the chart. also show currently loaded models for each model type
@@ -45,7 +44,6 @@ report and audit rewards and penalties in the RL training pipeline



-
 initial dash loads 180 historical candles, but then we drop them when we get the live ones. all od them instead of just the last. so in one minute we have a 2 candles chart :)
 use existing checkpoint manager if it;s not too bloated as well. otherwise re-implement  clean one where we keep rotate up to 5 checkpoints - best if we can reliably measure performance, otherwise latest 5

@@ -78,25 +76,17 @@ use existing checkpoint manager if it;s not too bloated as well. otherwise re-im



-
-
-
 we should load the models in a way that we do a back propagation and other model specificic training at realtime as training examples emerge from the realtime data we process. we will save only the best examples (the realtime data dumps we feed to the models) so we can cold start other models if we change the architecture. if it's not working, perform a cleanup of all traininn and trainer code to make it easer to work withm to streamline latest changes and to  simplify and refactor it


-<<<<<<< HEAD
 let's also work on the transformer model - we will add a candlestick tokenizer  that will use 8 dimentional vectors to represent candlesticks: 5 dim for OHLCV data, 1 for the timestamp, timeframe and symbol


-
-=======
-
 also, adjust our bybit api so we trade with usdt futures - where we can have up to 50x leverage. on spots we can have 10x max




-
 --------------


@@ -122,5 +112,4 @@ do we evaluate and reward/punish each model at each reference?


 in our realtime Reinforcement learning  training how do we calculate the score (reward/penalty?) 
-Let's use the mean squared difference between the prediction and the empirical outcome. We should do a training run at each inference which will use the last inference's prediction and the current price as outcome. do that up to  6 last predictions and calculating accuracity separately to have a better picture of the ability to predict couple of timeframes in the future. additionally to the frequent inference every 1 or 5s (i forgot the curent CNN rate) do an inference at each new timeframe interval. model should get the full data (multi timeframe - ETH (main) 1s 1m 1h 1d and 1m for BTC, SPX and one more) but should also know on what timeframe it is predicting. we predict only on the main symbol - so in 4 timeframes. bur on every hour we will do 4 inferences - one for each timeframe
->>>>>>> d49a473ed6f4aef55bfdd47d6370e53582be6b7b
+Let's use the mean squared difference between the prediction and the empirical outcome. We should do a training run at each inference which will use the last inference's prediction and the current price as outcome. do that up to  6 last predictions and calculating accuracity separately to have a better picture of the ability to predict couple of timeframes in the future. additionally to the frequent inference every 1 or 5s (i forgot the curent CNN rate) do an inference at each new timeframe interval. model should get the full data (multi timeframe - ETH (main) 1s 1m 1h 1d and 1m for BTC, SPX and one more) but should also know on what timeframe it is predicting. we predict only on the main symbol - so in 4 timeframes. bur on every hour we will do 4 inferences - one for each timeframe