Machine Learning with algoTraderJo
Hello fellow traders,
I am starting this thread hoping to share with you some of my developments in the field of machine learning. Although I may not share with you exact systems or coding implementations (don't expect to get anything to "plug-and-play" and get rich from this thread) I will share with you ideas, results of my experiment and possibly other aspects of my work. I am starting this thread in the hopes that we will be able to share ideas and help each other improve our implementations. I will start with some simple machine learning strategies and will then go into more complex stuff as time goes by. Hope you enjoy the ride!
Glad to hear some of you have already subscribed! I hope to make things interesting for you
I want to start by saying some basic things. I am sorry if the structure of my posts leaves a lot to be desired, I don't have any forum posting experience but hope to get some with time.
In machine learning what we want to do is simply to generate a prediction that is useful for our trading. To make this prediction we generate a statistical model using a set of examples (known outputs and some inputs we things have predictive power to predict those outputs) we then make a prediction of an unknown output (our recent data) using the model we created with the examples.
To sum it up it is a "simple" process where we do the following:
I want to say from the start that it is very important to avoid doing what many academic papers on machine learning do, which is to attempt to build a model with very large arrays of examples and then attempt to make a long term prediction on an "out-of-sample" set. Building a model with 10 years of data and then testing it on the last two is non-sense, subject to many types of statistical biases we will discuss later on.
Most of what I will be mentioning on this thread will focus on answering these questions, with actual examples. If you want write any questions you might have and I will attempt to give you an answer or simply let you know if I will answer that later on.
Let us get down to business now. A real practical example using machine learning. Let's suppose we want to build a very simple model using a very simple set of inputs/targets. For this experiment these are the answers to the questions:
This model will attempt to predict the directionality of the next daily bar. To build our model we take the past 200 examples (a day's direction as target and the previous two day directions as inputs) and we train a linear classifier. We do this at the start of every daily bar. If we have an example where two bullish days lead to a bearish day the inputs would be 1,1 and the target would be 0 (0=bearish, 1=bullish), we use 200 of these examples to train the model on each bar. We hope to be able to build a relationship where the direction of two days yields some above-random probability to predict the day's direction correctly. We use a stoploss equal to 50% of the 20 day period Average True Range on every trade.
It's interesting to think about what might be wrong in the above example:
The above generates the more interesting how questions:
Here is my example which I have thought about previously: Suppose you have observed that price often tends to retrace at some kind of level and now you have decided to backtest to see if you were right.
I have not done such backtests myself, but to me it is intuitively obvious that If you tried to backtest this with large SL/TP, e.g. 100 pips up and down - what you should get is very small edge that is very unlikely to offset trading costs.
It is important to understand the limit of your analysis. Well... so you predicted that buyers or sellers would step in. Hmm, but what exactly it has to do with price going up or down 100 pips? Price can react in various ways - it might just tank for some time (while all limit orders are filled) and then keep moving further. It can also retrace 5, 10, 50 or even 99 pips. In all of these cases you were kinda right about buyers or sellers stepping in, but you must understand that this analysis doesn't have much to do with your trade going from +90pip to +100pip .
Consider now that we change the model to a still simple yet more powerful classifier (a K-Nearest Neighbor approach) using the same input/target structure as above (two past days to predict next day's directionality). However we now have a stoploss of 70% of the Average True Range (risking 1% per trade) and we train using 70 instead of 200 examples. We still rebuild the model on each daily bar. See how our balance curve changes drastically:
We now have something that works much better, with a correlation coefficient of 0.95 on the log(balance) Vs Time. However the question still arises. How do we know the probability that this result is just due to random chance? (our model fitting nothing but noise and giving this result spuriously?). What do you think is the effect of changing the number of examples?
Subscribed and wish you good luck on your journey
Lets say if you have 100 pip TP and SL, I would want to predict which comes first: TP or SL
TP came first +1
SL came first 0 (or -1, however you map it)
Too bad the mode of the histogram will be exactly on the current price
This model that I mentioned: if TP comes first =+1, if SL comes first =0, could also be modeled using logistic regression, but with what predictor variables ? I personally don't know
However, if you mean this:
then I think I would approach this differently. Do you know how chess engines work?
Chess engines are programs that analyse chess positions and gives assessment of the position, -0.25 to +0.25 means the position is around equal, +0.25 to +0.5 means that white is slightly better (likewise -0.25 to -0.5 means black is slightly better), +1 represents that white has an advantage of something around one pawn.
More than 1.5 advanatge usually means that side is basically winning with perfect play by leading side.
One might try something similar here ... trying to assess how good a buy or sell is. And if the assessment at some point in time happens to go clearly in favour of one side.... then it might work as a trigger to opan a position. And when it gets back to zero you might as well exit, because you probably don't have an edge anymore.
What probably makes trading case more difficult is inputs - there are plenty of them, they are harder to assess and many of them are also hard to turn into code.
BTW, I am not sure if machine learning is involved in best chess engines. Inputs and their assessemts might be only human made.
Appreciate your effort
I think if we know next candle either bullish or bearish with high probability we can build a thousand of successful strategy
as we know the first step and the last step with high probability
go a head my friend algotraderjo ...
where do you get the technology to conduct these tests and modules?
© Forex Factory