Attachments: Machine Learning with algoTraderJo

Machine Learning with algoTraderJo

Post #81
Quote
Dec 10, 2014 8:31am Dec 10, 2014 8:31am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting algoTraderJo

Disliked

{quote} You train your K-NN or SVM using the past N examples, all of which have X bar directions as inputs and the target bar's direction as output. Using just the data you have posted you would build the examples: 0 0 => 0 0 0 => 1 0 1 => 0 0 1 => 1 1 0 => 0 1 0 => 1 1 1 => 0 1 1 => 1 Using examples like this you train the model on each bar and then you look for a prediction using the last data available. The last examples use 3 bars as input for the SVM and 4 bars as input for the K-NN. I use the open source shark...

Ignored

Thanks. There is a plethora of libs for machine learning in various languages: http://daoudclarke.github.io/machine...ing-libraries/
I personally use java. Weka seems good and features a standalone GUI for quick and dirty testings.

There is no need for a full blown ML library anyway for a trivial k-NN. A VP-tree does the job.

I still don't understand. Even with the 4 bar directions as inputs you only get 16 combinations. In the past 200 days all of them appeared in E/U. Any query vector (the 4 previous days directions) will exactly match the center of one cell of this hypercube. Each cell contains two targets with different counts. If you request less than the number of elements in this cell the result depends on the order they will be retrieved from the data structure. This is not necessarily random enough to "shuffle" them and get a good result. If you request more neighbors than elements in the cell, the result depends even more on the implementation of the data structure. You may have each neighboring cells visited in order. Say you get (Up, Up) as your query vector. (Up, Down) and (Down, Up) are both 1 unit distant from (Up, Up). A recursive algorithm will visit the first one and get as many elements as needed before going to the next. For example almost all the elements of (Up, Down) could be returned while none of (Down, Up) would.

I see the kNN and SVM more useful when the training samples are more scattered in their area of influence.

No greed. No fear. Just maths.

Post #82
Quote
Dec 10, 2014 8:55am Dec 10, 2014 8:55am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

{quote} Thanks. There is a plethora of libs for machine learning in various languages: http://daoudclarke.github.io/machine...ing-libraries/ I personally use java. Weka seems good and features a standalone GUI for quick and dirty testings. There is no need for a full blown ML library anyway for a trivial k-NN. A VP-tree does the job. I still don't understand. Even with the 4 bar directions as inputs you only get 16 combinations. In the past 200 days all of them appeared in E/U. Any query...

Ignored

I personally don't like Java very much, in my own trading things generally come down to being able to do a lot of intensive computing in a short span of time and therefore I prefer to use C/C++ for building machine learning models. I have also found lots of differences between the results of different libraries, reason why I consider it important to use the exact same libraries for simulations and real trading implementations.

You are certainly right in that you start to see a dependence on data structure due to the limited scattering of the data when using models like these using such a simple array of binary inputs. This certainly limits the usefulness of these models when doing something as simple as what I have showed within this thread. You can however see that it leads to actually useful trading implementations.

It is also important to note that these models have a very low probability of generating these trading results just by chance. Using random series generated using bootstrapping from the original time series (which generates a time series that has no non-spurious past-to-future relationships) the probability that you come up with a result of equal Sharpe ratio (than the last K-NN sample showed) is below 0.1% (tested on 1000 series, making sure I optimize the same degrees of freedom (number of bar inputs, number of training examples, stoploss, etc) on each series).

I always like to stress the importance of linking the models to trading implementations and doing actual trading system tests and evaluating probabilities to come from randomness using resulting trading system statistics. Evaluating the machine learning models in isolation gives you a limited perspective (if you do this solely). It is probable that by looking only at model results (accuracy of binary prediction, data structure dependence, etc) you would have never chosen a K-NN with such inputs but it turns out that you can actually create a useful trading implementation. This is one of the reasons why I think that academic literature routinely fails in the development of profitable systems using machine learning techniques, they focus too much on variables that are of limited importance to actual trading profitability.

I always build models and evaluate them in conjunction with the trading performance they generate. Of course there is a lot of value in understanding the models (this is definitely a must for success) but since trading profitably is the ultimate goal it makes sense (at least to me) to evaluate these two things in conjunction.

Post #83
Quote
Dec 10, 2014 9:13am Dec 10, 2014 9:13am

kprsa
Joined Feb 2014 | Status: ember | 1,268 Posts

Quoting algoTraderJo

Disliked

{quote}It is also important to note that these models have a very low probability of generating these trading results just by chance. Using random series generated using bootstrapping from the original time series (which generates a time series that has no non-spurious past-to-future relationships) the probability that you come up with a result of equal Sharpe ratio (than the last K-NN sample showed) is below 0.1% (tested on 1000 series, making sure I optimize the same degrees of freedom (number of bar inputs, number of training examples, stoploss,...

Ignored

One question: How would the system's (from post #10) profitability/DD change when the rules with the same 70% ADR/ATR stoploss would be:

(a) Take the trade in the direction of the previous day's candle.
(b) Take the trade in the direction opposite of the previous day's candle.
(c) Take the trade in the direction determined by a coin toss (50% up or down).

Seeing PipMeUp's data from #76 (and assuming that they are typical), I don't see any strong statistical bias for any rules based on two previous candles, so really do not understand how this can make money. I would consider the test (c) to be an alternative to the strategy test on the random data, basically does the strategy perform better than the coin toss with a stop loss over the same dataset.

Cheers,
k

Post #84
Quote
Dec 10, 2014 9:23am Dec 10, 2014 9:23am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Till now we have used only classifiers with binary inputs and outputs for the creation of trading systems. Nonetheless we can also use models that instead of predicting a category (in our case we were predicting a binary category) attempt to predict an actual value. This is the difference between regression and classification models, where classifiers attempt to give you an answer of where something belongs while regression models attempt to tell you its exact value. If the prediction of a bar's direction seems to work with classifiers then using the return of the past X bars as inputs to predict the return of the next bar should also work. In the simplest case I have used the return of past three bars [(Open-LastOpen)/LastOpen] and the return of the next bar as inputs/target. I used a stoploss of 60% of the ATR(20) risking 1% of the account per trade. A long trade is entered if the prediction is bullish and a short trade is entered if the prediction is bearish. The model I have chosen is a simple linear regression model. The model is retrained on every bar using the past 100 examples.

Attached Image (click to enlarge)

Click to Enlarge

Name: machinelearning-LR.png
Size: 28 KB

Our results are positive for this first test and somewhat match the results we expect as given by the classification models we studied before. However right now we have an additional piece of information (as we actually have a return prediction for each bar) which means that we can attempt to improve the system by trying to take further advantage of this position. How would you take advantage of this? Do you think it can give us something extra in this case?

Post #85
Quote
Dec 10, 2014 9:32am Dec 10, 2014 9:32am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting kprsa

Disliked

{quote} One question: How would the system's (from post #10) profitability/DD change when the rules with the same 70% ADR/ATR stoploss would be: (a) Take the trade in the direction of the previous day's candle. (b) Take the trade in the direction opposite of the previous day's candle. (c) Take the trade in the direction determined by a coin toss (50% up or down). Seeing PipMeUp's data from #76 (and assuming that they are typical), I don't see any strong statistical bias for any rules based on two previous candles, so really do not understand how...

Ignored

Good questions. Carry out the experiment of running a random system that trades with the same frequency (approximately 3700 trades across 20 years) with the same stoploss and see the probability that you will come up with a better sharpe. Running the 20 years is vital. You will come up with a very low probability (however don't take my word, run the test!). Why does the model work then? What is giving us a statistical edge here? The model does create an edge from making predictions, but how? Food for thought

Post #86
Quote
Dec 10, 2014 9:33am Dec 10, 2014 9:33am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting algoTraderJo

Disliked

I always build models and evaluate them in conjunction with the trading performance they generate. Of course there is a lot of value in understanding the models (this is definitely a must for success) but since trading profitably is the ultimate goal it makes sense (at least to me) to evaluate these two things in conjunction.

Ignored

This was also a question I had for you: if it is possible to measure the quality of an indicator independentely of the trading system that uses it. The answer is clearly NO from what I read. Therefore shouldn't the model contains the trading rules from the start? For instance, shouldn't the "0.5 x ATR20" be part of the parameters? And why the ATR as opposed to for instance EMA60(high)-EMA60(low)? Could this be infered from the data?

No greed. No fear. Just maths.

Post #87
Quote
Dec 10, 2014 9:40am Dec 10, 2014 9:40am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

{quote} This was also a question I had for you: if it is possible to measure the quality of an indicator independentely of the trading system that uses it. The answer is clearly NO from what I read. Therefore shouldn't the model contains the trading rules from the start? For instance, shouldn't the "0.5 x ATR20" be part of the parameters? And why the ATR as opposed to for instance EMA60(high)-EMA60(low)? Could this be infered from the data?

Ignored

Yes the 0.5 x ATR20 is part of the strategy parameters (perhaps I am not understanding very well what you mean?). About the ATR, I use it because it is the simplest best proxy I have found for future volatility. The ATR20 has a very good chance of predicting the next bar's volatility properly. Sure, you can use other measures of volatility if you wish, no problem with that. For my actual trading I use a measure of volatility determined by a random forest model that predicts the next bar's range. I do obtain a better estimate compared to the ATR, but not by such a large margin. If your estimate of volatility is good you will probably obtain very similar results to the ATR. Just using the average range of the past 20 bars gives you something along the same lines.

Post #88
Quote
Dec 10, 2014 10:13am Dec 10, 2014 10:13am

Wanderer272
Joined Aug 2011 | Status: Member | 134 Posts

AlgotraderJo, thanks for all this wonderful information.

How about using your drawdown as an additional input to your strategy? It seems to be a fairly stationary series, so you can use it to forecast future drawdowns. You can also keep record of the drawdown without actually trading and only trade when it reaches certain levels. You can also at times invert your strategy. You can probably repeat this process more than once in order to find the highest probability trades. Just a thought..

Post #89
Quote
Dec 10, 2014 11:00am Dec 10, 2014 11:00am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting algoTraderJo

Disliked

Yes the 0.5 x ATR20 is part of the strategy parameters (perhaps I am not understanding very well what you mean?)

Ignored

I mean that 0.5 and 20 should perhaps be themselves part of the solution of the regression. At least the 0.5.

For example, let's say the target candle is UP 65% of the ATR (close-open). This doesn't tell about the intraday DD. If the lower wick is 20% ATR any smaller SL would have been hit. But 50% SL would have been too much. Maybe the good value is 105% ATR or only 25%.

No greed. No fear. Just maths.

Post #90
Quote
Dec 10, 2014 11:06am Dec 10, 2014 11:06am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

{quote} I mean that 0.5 and 20 should perhaps be themselves part of the solution of the regression. At least the 0.5. For example, let's say the target candle is UP 65% of the ATR (close-open). This doesn't tell about the intraday DD. If the lower wick is 20% ATR any smaller SL would have been hit. But 50% SL would have been too much. Maybe the good value is 105% ATR or only 25%.

Ignored

Yes! This is a mechanism to improve results. We will get into that with time.

Post #91
Quote
Dec 10, 2014 11:24am Dec 10, 2014 11:24am

kprsa
Joined Feb 2014 | Status: ember | 1,268 Posts

Quoting algoTraderJo

Disliked

{quote} Why does the model work then? What is giving us a statistical edge here? The model does create an edge from making predictions, but how? Food for thought

Ignored

I guess it has to be when the system finds a trend over the previous learning period (= 2* lag of the model), the rules will change to show basically all the combinations along the trend. If the trend is found that is much longer than the learning period, the system will earn money on average. Also in the choppy markets there could be a slight edge, where the next bars directions are likely to be anticorrelated, which should be reflected in the rules (UP,DOWN->UP, DOWN,UP->DOWN) if the choppy period is longer than the learning period. Apart from this in my opinion this model would mostly find spurious relationships with little chance of working.

I suspect the spread was not taken into account in the model in #10?

Post #92
Quote
Dec 10, 2014 11:27am Dec 10, 2014 11:27am

Adal
Joined Mar 2009 | Status: Member | 770 Posts

I'm curious, why do you test with data going back to 1991? Trading 1991 data with 2014 methods is not fair.

To give an example, if you were to do dispersion trading on 1990 stock market data, you would win massive amounts. The problem is that dispersion trading was not invented yet back then, nobody knew that you could trade in that way.

In 1991 forex data you can find 30 minute long triangular arbitrage opportunities, or other similar easy pickings for a modern algo.

So in my opinion, testing using data before the rise of HFT in 2007 is meaningless.

Post #93
Quote
Dec 10, 2014 11:30am Dec 10, 2014 11:30am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting kprsa

Disliked

{quote} I guess it has to be when the system finds a trend over the previous learning period (= 2* lag of the model), the rules will change to show basically all the combinations along the trend. If the trend is found that is much longer than the learning period, the system will earn money on average. Also in the choppy markets there could be a slight edge, where the next bars directions are likely to be anticorrelated, which should be reflected in the rules (UP,DOWN->UP, DOWN,UP->DOWN) if the choppy period is longer than the learning period. Apart...

Ignored

All simulations contain a spread of 3 pips per trade.

Post #94
Quote
Dec 10, 2014 11:39am Dec 10, 2014 11:39am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting Adal

Disliked

I'm curious, why do you test with data going back to 1991? Trading 1991 data with 2014 methods is not fair. To give an example, if you were to do dispersion trading on 1990 stock market data, you would win massive amounts. The problem is that dispersion trading was not invented yet back then, nobody knew that you could trade in that way. In 1991 forex data you can find 30 minute long triangular arbitrage opportunities, or other similar easy pickings for a modern algo. So in my opinion, testing using data before the rise of HFT in 2007 is meaningless....

Ignored

We're looking at daily data here. If you compare the bar return and range distributions of 1991-2000 daily data with those of 2000-2014 daily data you will find that they don't change that much. Also consider that I am not giving any preference to 1991 data, my aim is to create something that works equally well on 1991 and 2014 data, I want a model that is stable along all of the market conditions I have available. Having something that is stable across all conditions leads to lower curve-fitting bias and lower data-mining bias than if you used only data for the past 7 years. This way of working has given me good results during the past several years, obviously I am not advocating that everybody does things this way, I am merely pointing the reasons why I approach the problem in this manner. Which way you choose to develop your strategies is unequivocally your prerogative.

Post #95
Quote
Dec 10, 2014 11:48am Dec 10, 2014 11:48am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting kprsa

Disliked

{quote} I guess it has to be when the system finds a trend over the previous learning period (= 2* lag of the model), the rules will change to show basically all the combinations along the trend. If the trend is found that is much longer than the learning period, the system will earn money on average. Also in the choppy markets there could be a slight edge, where the next bars directions are likely to be anticorrelated, which should be reflected in the rules (UP,DOWN->UP, DOWN,UP->DOWN) if the choppy period is longer than the learning period. Apart...

Ignored

Following the same idea. What would happen if we used various lags like 1, 2, 4, 8, 16, 32 bars? My idea is that it may regress a rough estimate of the Hurst exponent. When the market is in persistent mode the probability that the next bar is the same color of the current one is a bit higher. Conversly when the market is antipersistent.

No greed. No fear. Just maths.

Post #96
Quote
Dec 10, 2014 11:52am Dec 10, 2014 11:52am

kprsa
Joined Feb 2014 | Status: ember | 1,268 Posts

Quoting PipMeUp

Disliked

{quote} Following the same idea. What would happen if we used various lags like 1, 2, 4, 8, 16, 32 bars? My idea is that it may regress a rough estimate of the Hurst exponent. When the market is in persistent mode the probability that the next bar is the same color of the current one is a bit higher. Conversly when the market is antipersistent.

Ignored

Agreed, good idea, this could shorten the usual lag a bit.
k

Post #97
Quote
Dec 10, 2014 11:53am Dec 10, 2014 11:53am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting kprsa

Disliked

{quote} Agreed, good idea, this could shorten the usual lag a bit. k

Ignored

At least it would be built-in multi TF :-)

No greed. No fear. Just maths.

Post #98
Quote
Dec 10, 2014 1:01pm Dec 10, 2014 1:01pm

Sasco_me
Joined Apr 2007 | Status: (! UseStopLoss == ! Win ) | 186 Posts

Quoting PipMeUp

Disliked

{quote} Following the same idea. What would happen if we used various lags like 1, 2, 4, 8, 16, 32 bars? My idea is that it may regress a rough estimate of the Hurst exponent. When the market is in persistent mode the probability that the next bar is the same color of the current one is a bit higher. Conversly when the market is antipersistent.

Ignored

Good Idea
and what about if we shared the fundamental with it and the lag become changeable with the Fundamental event
example if use daily we can use x1,x2,x3,x4,x5,x6,x7,x8 and x9
where x1 ,x2 and x3 for the last candle (also here we can cover full week i.e five candle )
X4 ,x5,x6 is the same day candle for last week to cover one month
x7 ,x8 and x9 is the same day candle for the last months to cover one quarter (this group will be roughly with the same fundamental event )
sorry for my language as it is not my native

I'm not a programmer and i don't like ! , but only I try to catch my view !

Post #99
Quote
Dec 10, 2014 2:37pm Dec 10, 2014 2:37pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Here are two plots showing you the actual returns on the EURUSD and the returns that are predicted by the linear regression model at each point:

Attached Image (click to enlarge)

Click to Enlarge

Name: machinelearing_lr_actual.png
Size: 10 KB

Attached Image (click to enlarge)

Click to Enlarge

Name: machinelearing_lr_prediction.png
Size: 9 KB

As you can see the predictions made by the linear regression model fall in line with the increases/decreases of volatility of the actual data but the size of our predictions is actually lower. Since most daily bars do not go above 1% (0.01) the model almost never predicts changes of this magnitude. An issue with machine learning models is always that our returns depend largely on accurately predicting a few outliers which are by their very nature hard to predict. Dealing with these outliers is a big part of succeeding with the use of machine learning strategies. Note that dealing with them is not predicting them accurately as they are mostly caused by things we cannot control (unexpected events). We will go more into how to deal with these outliers later on.

I also attach the actual/prediction data in case some of you want to play a bit with it

Be sure to post any conclusions.

Attached File(s)

LR_actual_and_predictions.csv 119 KB | 487 downloads

Post #100
Quote
Dec 10, 2014 2:45pm Dec 10, 2014 2:45pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

It is also interesting to note that the normalization of the data (using returns instead of absolute price differences) has little effect when using this linear regression model although the effect is not zero. Below you can see the results for the system using both. Returns calculate (Open-lastOpen)/lastOpen while absolute price differences only use Open-lastOpen for the calculation of inputs/target.

Attached Image

Using this type of normalization might become important later on when we use other model types for regression, so keep it in mind.

Trading Discussion
/
Machine Learning with algoTraderJo
Reply to Thread
- 1 3 4 Page 5 6 7 47
- 1 4 Page 5 6 47

0 traders viewing now

Options

Similar Threads

Machine Learning with algoTraderJo