Attachments: Machine Learning with algoTraderJo

Machine Learning with algoTraderJo

Post #461
Quote
May 21, 2015 6:03pm May 21, 2015 6:03pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

The mining bias problem is not difficult to solve. We have a strategy X that is the result of performing a mining process A on a dataset B. Our null hypothesis is that the relationships found by the system are spurious, meaning that the results of the strategy are due to random relationships within the data that just happened to lead to a profitable outcome. We want to be able to negate this null hypothesis, we want to say that our strategy is the result of real inefficiencies within B and not simply the result of "looking too hard" within the data.

The solution is as follows:

Construct N random datasets that have the same statistical characteristics as B. You can use bootstrapping with replacement, which preserves most statistical properties of the data except relationships between bars (the new random data contains no inefficiencies).
Perform the exact same mining process A on all random datasets (which means perform the same optimization process to attempt to find a system as good as X).
Create an average distribution of systems using the results from mining A across all N datasets
The distribution must be stable. You need to use a large enough N. I generally do about 500+ mining exercises on random data to build the distribution. This means a lot of computer power. The F4 framework allows for OpenMP/MPI so I have been able to do validations using clusters without much need for additional coding.
Using this distribution determine the probability that you found X in B due to random chance
You can then accept X as not derived from spurious relationships if the probability falls below your chosen confidence interval. I generally only accept strategies at a 4 Sigma level (99.7% confidence).

The above type of test is what babelproofre has described, it is derived from the notions put forward by the paper on White's reality check but is computationally more expensive. I have performed the above test to evaluate for data mining bias during the past several years, this is also the standard practice at the firm where I am working. The above process corresponds to the classic methodology for data-mining bias evaluation (also advice to read the articles posted by babelproofre, classics).

Your are, in simple terms, comparing what your mining process (the strategy search process) can yield on real data with what it yields on average on random data. The mining process needs to generate many more systems of the intended quality on the real data compared to the random data, otherwise it means that the systems can simply be found by your process due to spurious relationships.

When doing machine learning you will often find that you cannot say that your system did not simply find spurious relationships. You will often find that repeating the mining process on random data leads to a very significant number of strategies with the same statistical properties. The more elaborate your mining method is, the more prone you are to this, the larger your genetic optimization, the worse as well.

Being able to perform the above evaluation method is what IMHO separates 99% of the people who attempt machine learning and fail from those of us who have been able to do it successfully. It takes time, it is computationally expensive but you end up with things that work. That of course, all in my personal experience on the matter.

Post #462
Quote
May 21, 2015 7:07pm May 21, 2015 7:07pm

babelproofre
| Joined Oct 2009 | Status: Member | 23 Posts

I have created a repo on Github at https://github.com/Dekalog/Data-Snooping-Tests with MATLAB/Octave code for various data snooping tests, as well as uploading some academic papers on the subject.

Post #463
Quote
May 21, 2015 8:27pm May 21, 2015 8:27pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting babelproofre

Disliked

I have created a repo on Github at https://github.com/Dekalog/Data-Snooping-Tests with MATLAB/Octave code for various data snooping tests, as well as uploading some academic papers on the subject.

Ignored

Nice contribution

Post #464
Quote
May 22, 2015 4:46am May 22, 2015 4:46am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

I wish I'm wrong but I'm afraid you overlook the problem.

Quoting algoTraderJo

Disliked

]Perform the exact same mining process A on all random datasets (which means perform the same optimization process to attempt to find a system as good as X).

Ignored

Do you mean using the very same system, i.e. the vector (A,B,C,D) is fixed, ran on the N random samples or searching a new vector (A,B,C,D) for each of them? The second option shall be discarded immediately because if the mining for (A,B,C,D) could find a spurious relation it is very likely it will also be able to find another spurious relation on any other randomly generated set. Never rejecting the null.

Quoting algoTraderJo

Disliked

]Using this distribution determine the probability that you found X in B due to random chance
--
You can then accept X as not derived from spurious relationships if the probability falls below your chosen confidence interval. I generally only accept strategies at a 4 Sigma level (99.7% confidence).

Ignored

In the case of the first option... If (A,B,C,D) is fixed you're in the case of the 15 coin flips. You can't tell if the other failed because
the system is real or if it was extremely lucky. Your 4 sigma level is way too small. Remember that the 15 flips experiment is 99.97% better than the other; more than 4 sigma and yet spurious!

I tried subsampling the system with (A,B,C,D) fixed. 85% were successful. It means 15% were not. This is below 3 sigma. Shall I reject?

Even if this system was successfully traded, understand makes money, during the next 5 years it wouldn't be enough to conclude it is real! From the equity curve you can see that a 40% DD that lasts several years is nothing abnormal. For the same reason a spurious +40% would be as well. Another problem is that if you get a 60% DD this wouldn't be enough the reject it either!

Using statistical correction like Harvey-Liu would reject any strategy unless it could show results that would be so great that anybody would suspect it fluke. At the same time the "little guy" in the picture of my previous post would never be allowed to touch anybody's money.

White's reality check is too weak. Harvey-Liu is too strong.

No greed. No fear. Just maths.

Post #465
Quote
May 22, 2015 7:09am May 22, 2015 7:09am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

I wish I'm wrong but I'm afraid you overlook the problem. {quote} Do you mean using the very same system, i.e. the vector (A,B,C,D) is fixed, ran on the N random samples or searching a new vector (A,B,C,D) for each of them? The second option shall be discarded immediately because if the mining for (A,B,C,D) could find a spurious relation it is very likely it will also be able to find another spurious relation on any other randomly generated set. Never rejecting the null. {quote} In the case of the first option... If (A,B,C,D) is fixed you're in...

Ignored

You don't use a fixed system, you apply the same mining process to each random data set. If you performed a genetic optimization to find A,B,C,D then you repeat the exact same process on random data N times, to build a distribution of expected candidates. You will always be able to find some spurious relationship A,B,C,D on random data, but with what probability? For example when I have a system truly based on historical inefficiencies the mining process on the real data generates X candidates while on average on random data I see 1000 times less systems with the same statistical characteristics. I have about 1000x more systems than I would expect from randomness on the real data. This process really works man, try to implement it for a practical application and you'll see it does.

Post #466
Quote
May 22, 2015 7:15am May 22, 2015 7:15am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quote

Disliked

to build a distribution of expected candidates

What do you mean by "distribution of expected candidates"? Is it the distribution of the 4-vector or the distribution of the equity curves?

No greed. No fear. Just maths.

Post #467
Quote
May 22, 2015 7:20am May 22, 2015 7:20am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

{quote} What do you mean by "distribution of expected candidates"? Is it the distribution of the 4-vector or the distribution of the equity curves?

Ignored

It can be done in many ways. A simple way is to choose a statistic to track, like the Sharpe ratio. In this example you build a distribution of systems according to their Sharpe ratio as expected from random data sets and you compare with the distribution you get from real data.

Post #468
Quote
May 22, 2015 7:49am May 22, 2015 7:49am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

If I generate 160000 variants of the lin.reg. algo I will always find one that outperform all the others regardless of the dataset used. Trivially argmax k { Sharpe(algo(A_k, B_k, C_k, D_k)) } always exists. And it will certainly outperform almost all of the random strategies. What is this supposed to prove or reject?

No greed. No fear. Just maths.

Post #469
Quote
May 22, 2015 8:02am May 22, 2015 8:02am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

If I generate 160000 variants of the lin.reg. algo I will always find one that outperform all the others regardless of the dataset used. Trivially argmax k { Sharpe(algo(A_k, B_k, C_k, D_k)) } always exists. And it will certainly outperform almost all of the random strategies. What is this supposed to prove or reject?

Ignored

argmax k { Sharpe(algo(A_k, B_k, C_k, D_k)) } when doing mining on the real data is not necessarily greater than argmax k { Sharpe(algo(A_k, B_k, C_k, D_k)) } on the average random data set. When mining bias is significant you will find that the maximum Sharpe on the real data has a high probability to come up on the random data as well. If mining bias is low you will see that the probability on random data is >1000 times lower. I think you won't really understand the point till you perform the exercise. Build a distribution of the sharpe ratios that you find on the real data for a mining exercise, then compare this with the distribution of Sharpe ratios built from the average of performing this same mining exercise across 200+ random data sets.

But before continuing let me point out that I am not trying to convince you of anything, neither do I want nor have the time to engage in an argument about the validity of this mining bias evaluation technique (I am more than convinced about its usefulness and validity). I am just exposing what has worked for me during the past several years, it is obviously your prerogative whether you take it or leave it.

Post #470
Quote
May 24, 2015 8:09pm May 24, 2015 8:09pm

babelproofre
| Joined Oct 2009 | Status: Member | 23 Posts

Quoting PipMeUp

Disliked

If I generate 160000 variants of the lin.reg. algo I will always find one that outperform all the others regardless of the dataset used. Trivially argmax k { Sharpe(algo(A_k, B_k, C_k, D_k)) } always exists. And it will certainly outperform almost all of the random strategies. What is this supposed to prove or reject?

Ignored

Because somehow I can't upload images to the forum and I don't want to hijack this thread, I have written a short post about data mining bias on my my blog at http://dekalogblog.blogspot.com/2015...ning-bias.html. I hope readers find it interesting.

Post #471
Quote
May 24, 2015 10:45pm May 24, 2015 10:45pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting babelproofre

Disliked

{quote} Because somehow I can't upload images to the forum and I don't want to hijack this thread, I have written a short post about data mining bias on my my blog at http://dekalogblog.blogspot.com/2015...ning-bias.html. I hope readers find it interesting.

Ignored

Awesome post

Post #472
Quote
May 24, 2015 10:49pm May 24, 2015 10:49pm

shellsnail
Joined Aug 2012 | Status: Trends, Levels, Confirmation, Bayes | 1,834 Posts

Quoting babelproofre

Disliked

{quote} Because somehow I can't upload images to the forum and I don't want to hijack this thread, I have written a short post about data mining bias on my my blog at http://dekalogblog.blogspot.com/2015...ning-bias.html. I hope readers find it interesting.

Ignored

But we can side-step this issue by not peeking at the test set in the first place am I right?

As long as what we are working with is the training data, it doesn't matter how much we mine it as long as we evaluate it on the test set, and ONLY once, without changing the algo afterwards.

Build good relationships with others.

Post #473
Quote
May 24, 2015 10:55pm May 24, 2015 10:55pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

To start the week with something interesting I want to show you that the machine learning approach using MLE/MSE values can also be applied successfully with Neural Networks. The backtest I am posting here shows you the results of using a Neural Network (A=1 GMT +1/+2, B=80, C=5, D=5) on the USDJPY. The NN is trained for 100 cycles using back propagation with the following network topology (unoptimized):

Input layer : 5 neurons (5 past bar returns)
Hidden layer: 10 neurons
Output layer: 2 neurons (MLE and MSE)

For this NN I have used fast sigmoid functions. Remember that the NN is retrained using the past 80 examples (B) when each trading decision is made at 1 GMT +1/+2. The system uses a 50% of the ATR(20) stoploss value.

Attached Image

The horizon we are using for this NN is actually much lower than for previously posted results. Is a neural network better or worse here? Are we better off using linear regression? Should we use as many kinds of algorithms as we can? How do we choose models? Think about it.

Post #474
Quote
May 24, 2015 11:01pm May 24, 2015 11:01pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting shellsnail

Disliked

{quote} But we can side-step this issue by not peeking at the test set in the first place am I right? As long as what we are working with is the training data, it doesn't matter how much we mine it as long as we evaluate it on the test set, and ONLY once, without changing the algo afterwards.

Ignored

This is a classic problem with testing, you simply cannot circumvent the mining bias issue using historical data. There is no such thing as using a historical "testing set" only once. What happens if you develop something and then it bombs out on the testing set? You repeat the mining process with some variation on the training set until you obtain something that seems to work on the "testing set". In reality any live trading you will do will be of something that works both on the "training set" and the "testing set" therefore the "testing set" is never a true unseen data that you "use only once". In real life you do "training set"->"testing set" exercises until something works on everything, you have in fact used the "testing set" many times only that using it in this manner does not allow for an easy quantification of mining bias.

It is far far far far better to simply use all the data you have and properly account for mining bias.

This mistake of thinking that there is such a thing as an "out of sample" test using historical data is very common and I believe causes tons of failure for aspiring algo traders who do not account for mining bias thinking that their "out of sample" is really telling them anything (when as portrayed above, it is meaningless).

Post #475
Quote
May 25, 2015 12:51am May 25, 2015 12:51am

FXEZ
Joined Jan 2007 | Status: developing... | 974 Posts

Based on my understanding, the general data mining bias test being discussed here compares a particular system or systems on real data to systems produced on random data. So it is essentially a test of actual historical results vs. results on random data with the idea that results should be significantly less on the random data in order to signify that actual system results on historical data are not due to data mining bias.

But it seems like this test is also being used as a proxy for, "How well will a given system generalize in the future." However, this data mining bias test doesn't actually test for generalization on new data. While it could be assumed that a system with low data mining bias would generalize well on new data, from my experience with non-adaptive systems this isn't necessarily the case. Are there any approaches that directly address the problem of generalization on new data?

I think I read somewhere that using the equity curve for the resampling procedure is superior to randomly resampling the data, but now that I look for it, I can't find any reference so I'm not sure of the context. Certainly using the equity curve in the resampling procedure produces a different hypothesis.

Post #476
Quote
May 25, 2015 7:06am May 25, 2015 7:06am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting FXEZ

Disliked

Based on my understanding, the general data mining bias test being discussed here compares a particular system or systems on real data to systems produced on random data. So it is essentially a test of actual historical results vs. results on random data with the idea that results should be significantly less on the random data in order to signify that actual system results on historical data are not due to data mining bias. But it seems like this test is also being used as a proxy for, "How well will a given system generalize in the future." However,...

Ignored

You are spot on. We are evaluating for data-mining bias which has nothing to do with curve fitting bias (what you are talking about). What we see with mining bias is whether the system is the product of real causal relationships within the historical data but this does not say anything about whether these relationships are general or just particular to the data at hand. The ability of a given system to "generalize" cannot be measured because we don't know what "general" actually is. To be able to measure if a system generalizes well we would need to know what the general, prevalent phenomena in price series are, that we don't know.

To protect from this you have to use as much data as you can within your simulations so that your chance of over-fitting to some special, non-general feature of the time series is reduced. From my personal experience you need more than 15 years of data for this to be the case when doing things on a single pair. Alternatively you may want to develop something that works on several pairs at the same time, which also decreases the curve fitting bias.

Post #477
Quote
May 26, 2015 7:18am May 26, 2015 7:18am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

The GBP/USD is complicated. Finding a profitable ML algorithm for this pair is always a challenge. Using linear regression we can find something "decent" with A=12 GMT+1/+2, B=120, C=9, D=10 using a stop loss of 50% of the ATR(20) using our classic MLE/MSE targets. This is the backtest:

Attached Image

Neural networks perform better or worse for this pair? Any educated guesses?

Post #478
Quote
May 26, 2015 12:00pm May 26, 2015 12:00pm

KaBo
| Joined Dec 2014 | Status: Member | 32 Posts

Yippee!

Thank you Jo for the description, and Daniel for the support, but it was what I was doing all along.
There must had been some problem with my setup, operating system, or who knows.

With the new Framework version I finally get the same results and I can replicate all your results from post 423 onward.
I hope this continues

This was really frustrating as I double and triple checked and rechecked and got a result, but totally off. Weired stuff.

Attached Image (click to enlarge)

Click to Enlarge

Name: resultsGBPUSD.png
Size: 66 KB

Post #479
Quote
May 26, 2015 1:10pm May 26, 2015 1:10pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting KaBo

Disliked

Yippee! Thank you Jo for the description, and Daniel for the support, but it was what I was doing all along. There must had been some problem with my setup, operating system, or who knows. With the new Framework version I finally get the same results and I can replicate all your results from post 423 onward. I hope this continues This was really frustrating as I double and triple checked and rechecked and got a result, but totally off. Weired stuff. {image}

Ignored

Awesome KaBo

. Having others reproduce is a very important part of any research process.

Post #480
Quote
Edited 1:40pm May 26, 2015 1:25pm | Edited 1:40pm

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

With the linear regression systems I have posted for USDJPY, GBPUSD and EURUSD we can already create a very decent machine learning portfolio. The portfolio below uses a 0.4% risk per trade. This trades 3 major pairs for about 25 years using linear regression machine learning models that retrain when each new trading decision is made.

Attached Image

Linearity for this portfolio (in log(balance) vs Time) is very good. With an average yearly return of 20% and a maximum drawdown of around 16% this is already something that we could consider for possible live trading if it passed other necessary tests (like the data mining bias test discussed before).

Trading Discussion
/
Machine Learning with algoTraderJo
Reply to Thread
- 1 22 23 Page 24 25 26 47
- 1 23 Page 24 25 47

0 traders viewing now

Options

Similar Threads

Machine Learning with algoTraderJo