Overview

Our main goal is to have multi modal model with decision making module (orchestrator ) interconnected CNN and RL modules. the inputs will be multi timeframe and multi symbol (ETH AND BTC, but extendable/configurable) and the outputs will be trading actions. in our RL module loop we will evaluate trading actions from the orchestrator to close the loop and addapt to the current market environment and learn form past experience. in our CNN training pipeline we should be able to use marked data to do back propagation on the perfect moves we had to do looking at the past data (when we know the future move we want to predict) CNN module will have predictions on each timeframe as outputs - paired with it's own confidence.

RL model

what data do the RL model take as input? in the dash i see: Training Data Stream Tick Cache: 129 ticks 1s Bars: 128 bars Stream: LIVE

RL (and CNN , and all models) should have available the following data: ETH: 300s max of raw ticks data - this is important for detecting single big moves and momentum 300s of 1s OHLCV data (5 min) 300 OHLCV + indicatros bars of each 1m 1h 1d and 1s BTC

RL model should have also access of the last hidden layers of the CNN model where patterns are learned. it can be empty if CNN model is not active or missing. as well as the output (predictions) of the CNN model for each timeframe (1s 1m 1h 1d) and next expected pivot point

CNN model

CNN modell will take the same market data as the RL model but it will learn patterns and predict the next pivot point for each timeframe (1s 1m 1h 1d) during retrospective training we will programatically calculate pivot points and compare the output of the CNN model with the actual pivot points to calculate the accuracy of the model. we will have 2 types of pivot points: 1: standard for the 1s 1m 1h 1d (primary) ETH and (reference) BTC tickers. each as array of 50 pivot points max. 2: 5 pivot point recursively calculated for 1s OHLCV data following the following description: "Let me explain. This logic is based on Larry Williams market structure. Swing high, swing lows. This is swing low. Swing low is candle with higher lows on the both side of it. And swing high is a candle with lower highs on the both side of it. By using these swing points, I'm able to determine next trend. In my case, blue trend. And by using blue trend swing points, I'm able to determine purple trend and so forth. And that's the easiest way to determine a trend. As we can see orange trend that is a little bit smaller trend that yellow trend is up but orange trend failed its higher low creation right now and as we can see magenta trend here is going down. It's very probable that magenta trend is going to create lower high here above 59,34. If this high breaks then there is a still a possibility that this orange trend creates higher lows here and then continues to go higher. "

so the first shortest trend pivot points is the 1s OHLCV data. where a pivot point is defined as bar with higher low (or lower high) on the both side of it.
next trend pivot points are calculated from THE FIVE PIVOT POINTS OF THE PREVIOUS TREND. 
this way we can have a recursive pivot points calculation that will be used to predict the next trend. each trend will be more and more long term.
theese pivot points will define the trend direction and the trend strength.

level 2 pivot should not use different (bigger ) price timeframe, but should use the level1 pivot points as candles instead. so a level 2 low pivot is a when a level 1 pivot low is surrownded by higher level 1 pibot lows

input should be multitiframe and multi symbol timeseries with the label of the "chart" included, so the model knows what th esecondary timeseries is. So primary symbol (that we trade, now ETC):

5 min of raw ticks data
900 of 1s timeseries with common indicators
900 of 1m and 900 of 1h with indicators
all the available pivot points (multiple levels)
one additional reference symbol (BTC) - 5 min ot ticks if there are no ticks, we bstitute them with 1s or lowest ohclv data. this is my idea, but I am open to improvement suggestions. output of the CNN model should be the next pibot point in each level course, data must be normalized to the max and min of the highest timeframe, so the relations between different timeframes stay the same

training CNN model

run cnn training fron the dashboard as well - on each pivot point we inference and pipe results to the RL model, and train on the data we got for the previous pivotrun cnn training fron the dashboard as well - on each pivot point we inference and pipe results to the RL model, and train on the data we got for the previous pivot

well, we have sell signals. don't we sell at the exact moment when we have long position and execute a sell signal? I see now we're totaly invested. change the model outputs too include cash signal (or learn to make decision to not enter position when we're not certain about where the market will go. this way we will only enter when the price move is clearly visible and most probable) learn to not be so certain when we made a bad trade (replay both entering and exiting position) we can do that by storing the models input data when we make a decision and then train with the known output. This is why we wanted to have a central data probider class which will be preparing the data for all the models er inference and train.

I see we're always invested.adjust the training, reward functions and possibly model outputs to include CASH signal where we sell our positions but we keep off the market. or use the orchestrator to learn to make that decison when gets uncertain signals from the expert models.mods hould learn to effectively spot setups in the market which are with high risk/reward level and act on theese

also, implement risk management (stop loss) make all dashboard processes run on the server without need of dashboard page to be open in a browser. add Start/Stop toggle on the dash to control it, but all processes should hapen on the server and the dash is just a way to display and contrl them. auto start when we start the web server.

if that does not work I think we can make it simpler and easier to train if we have just 2 model actions buy/sell. we don't need hold signal, as until we have action we hold. And when we are long and we get a sell signal - we close. and enter short on consequtive sell signal. also, we will have different thresholds for entering and exiting. learning to enter when we are more certain this will also help us simplify the training and our codebase to keep it easy to develop. as our models are chained, it does not make sense anymore to train them separately. so remove all modes from main_clean and all referenced code. we use only web mode wherehe flow is: we collect data, calculate indicators and pivot points -> CNN -> RL => orchestrator -> broker/web

6.8 KiB Raw Blame History

Overview

RL model

CNN model

training CNN model

6.8 KiB

Raw Blame History