Overview

Our main goal is to have multi modal model with decision making module (orchestrator ) interconnected CNN and RL modules. the inputs will be multi timeframe and multi symbol (ETH AND BTC, but extendable/configurable) and the outputs will be trading actions. in our RL module loop we will evaluate trading actions from the orchestrator to close the loop and addapt to the current market environment and learn form past experience. in our CNN training pipeline we should be able to use marked data to do back propagation on the perfect moves we had to do looking at the past data (when we know the future move we want to predict) CNN module will have predictions on each timeframe as outputs - paired with it's own confidence.

RL model

what data do the RL model take as input? in the dash i see: Training Data Stream Tick Cache: 129 ticks 1s Bars: 128 bars Stream: LIVE

RL (and CNN , and all models) should have available the following data: ETH: 300s max of raw ticks data - this is important for detecting single big moves and momentum 300s of 1s OHLCV data (5 min) 300 OHLCV + indicatros bars of each 1m 1h 1d and 1s BTC

RL model should have also access of the last hidden layers of the CNN model where patterns are learned. it can be empty if CNN model is not active or missing. as well as the output (predictions) of the CNN model for each timeframe (1s 1m 1h 1d) and next expected pivot point

CNN model

CNN modell will take the same market data as the RL model but it will learn patterns and predict the next pivot point for each timeframe (1s 1m 1h 1d) during retrospective training we will programatically calculate pivot points and compare the output of the CNN model with the actual pivot points to calculate the accuracy of the model. we will have 2 types of pivot points: 1: standard for the 1s 1m 1h 1d (primary) ETH and (reference) BTC tickers. each as array of 50 pivot points max. 2: 5 pivot point recursively calculated for 1s OHLCV data following the following description: "Let me explain. This logic is based on Larry Williams market structure. Swing high, swing lows. This is swing low. Swing low is candle with higher lows on the both side of it. And swing high is a candle with lower highs on the both side of it. By using these swing points, I'm able to determine next trend. In my case, blue trend. And by using blue trend swing points, I'm able to determine purple trend and so forth. And that's the easiest way to determine a trend. As we can see orange trend that is a little bit smaller trend that yellow trend is up but orange trend failed its higher low creation right now and as we can see magenta trend here is going down. It's very probable that magenta trend is going to create lower high here above 59,34. If this high breaks then there is a still a possibility that this orange trend creates higher lows here and then continues to go higher. "

so the first shortest trend pivot points is the 1s OHLCV data. where a pivot point is defined as bar with higher low (or lower high) on the both side of it.
next trend pivot points are calculated from THE FIVE PIVOT POINTS OF THE PREVIOUS TREND. 
this way we can have a recursive pivot points calculation that will be used to predict the next trend. each trend will be more and more long term.
theese pivot points will define the trend direction and the trend strength.

level 2 pivot should not use different (bigger ) price timeframe, but should use the level1 pivot points as candles instead. so a level 2 low pivot is a when a level 1 pivot low is surrownded by higher level 1 pibot lows

input should be multitiframe and multi symbol timeseries with the label of the "chart" included, so the model knows what th esecondary timeseries is. So primary symbol (that we trade, now ETC):

5 min of raw ticks data
900 of 1s timeseries with common indicators
900 of 1m and 900 of 1h with indicators
all the available pivot points (multiple levels)
one additional reference symbol (BTC) - 5 min ot ticks if there are no ticks, we bstitute them with 1s or lowest ohclv data. this is my idea, but I am open to improvement suggestions. output of the CNN model should be the next pibot point in each level course, data must be normalized to the max and min of the highest timeframe, so the relations between different timeframes stay the same

training CNN model

run cnn training fron the dashboard as well - on each pivot point we inference and pipe results to the RL model, and train on the data we got for the previous pivotrun cnn training fron the dashboard as well - on each pivot point we inference and pipe results to the RL model, and train on the data we got for the previous pivot

well, we have sell signals. don't we sell at the exact moment when we have long position and execute a sell signal? I see now we're totaly invested. change the model outputs too include cash signal (or learn to make decision to not enter position when we're not certain about where the market will go. this way we will only enter when the price move is clearly visible and most probable) learn to not be so certain when we made a bad trade (replay both entering and exiting position) we can do that by storing the models input data when we make a decision and then train with the known output. This is why we wanted to have a central data probider class which will be preparing the data for all the models er inference and train.

I see we're always invested. adjust the training, reward functions use the orchestrator to learn to make that decison when gets uncertain signals from the expert models.mods hould learn to effectively spot setups in the market which are with high risk/reward level and act on theese if that does not work I think we can make it simpler and easier to train if we have just 2 model actions buy/sell. we don't need hold signal, as until we have action we hold. And when we are long and we get a sell signal - we close. and enter short on consequtive sell signal. also, we will have different thresholds for entering and exiting. learning to enter when we are more certain this will also help us simplify the training and our codebase to keep it easy to develop. as our models are chained, it does not make sense anymore to train them separately. so remove all modes from main_clean and all referenced code. we use only web mode wherehe

####### flow is: we collect data, calculate indicators and pivot points -> CNN -> RL => orchestrator -> broker/web we use UnifiedDataStream to collect data and pass it to the models.

orchestrator model also should be an appropriate MoE model that will be able to learn to make decisions based on the signals from the expert models. it should be able to include more models in the future.

DASH

also, implement risk management (stop loss) make all dashboard processes run on the server without need of dashboard page to be open in a browser. add Start/Stop toggle on the dash to control it, but all processes should hapen on the server and the dash is just a way to display and contrl them. auto start when we start the web server.

all models/training/inference should be run on the server. dashboard should be used only for displaying the data and controlling the processes. let's add a start/stop button to the dashboard to control the processes. also add slider to adjust the buy/sell thresholds for the orchestrator model and therefore bias the agressiveness of the model actions.

add a row with small charts showing all the data we feed to the models: the 1m 1h 1d and reference (btc) ohlcv on the dashboard

PROBLEMS

also, tell me which CNN model is uesd in /web/dashboard.py training pipeline right now and what are it's inputs/outputs?

CNN model should predict next pivot point and the timestamp it will happen at - for each of the pivot point levels taht we feed. do we do that now and do we train the model and what is the current loss?

overview/overhaul

but why the classes in training folder define their own models??? they should use the models defined in NN folder. no wonder i see no progress in trining. audit the whole project and remove redundant implementations. as described, we should have single point where data is prepared - in the data probider class. it also calculates indicators and pivot points and caches different timeframes of OHLCV data to reduce load and external API calls. then the web UI and the CNN model consume that data in inference mode but when a pivot is detected we run a training round on the CNN. then cnn outputs and part of the hidden layers state are passed to the RL model which generates buy/sell signals. then the orchestrator (moe gateway of sorts) gets the data from both CNN and RL and generates it's own output. actions are then shown on the dash and executed via the brokerage api

9.0 KiB Raw Permalink Blame History