Attachments: Machine Learning with algoTraderJo

Machine Learning with algoTraderJo

Post #521
Quote
Edited 7:04am May 30, 2015 6:52am | Edited 7:04am

jcl366
| Joined May 2015 | Status: Member | 4 Posts

Quoting algoTraderJo

Disliked

A few things: It does not work on the daily timeframe (the system trains each example using hourly returns), this system requires use of the hourly timeframe and all the specified degrees of freedom. You can test this yourself.

Ignored

Yes. What I meant was that the MAE/MFE data are taken over the hours of a business day with the parameters (A ~ 9, D ~ 8) tested by PipMeUP.

Quoting algoTraderJo

Disliked

You cannot simply "fix and eliminate" parameters after a mining exercise. Finding a system through a complicated mining technique and then reducing parameters to find the same system on a smaller space is fooling yourself. When you reduce the mining space around an already known profitable system you are creating additional mining bias. When you do data-mining bias you need to use the same technique that originally produced the system you are evaluating, the process on random data needs to have the exact same freedom. There is no way around doing...

Ignored

I would generally agree, but disagree in this case. You're right when you use data mining just for fiddling with arbitrary parameters until you got the best system. But you can also use it for identifying a certain, explainable market inefficiency. A system must be based anyway on an inefficiency in the real world or it would not work in real trading. If you have identified it, you can limit the mining job to systems that exploit that very inefficiency - here, patterns in the daily market opening prices. I think you can then limit the parameter space to the opening and closing hours of this market.

Of course this would be different when you had just found some strange values for your parameters A and D, with no real explanation for them.

For measuring data mining bias: Do you have a plot of the performance distribution of the systems that have been checked by the genetic selection process? And a similar plot, but with shuffled and detrended price data?

Post #522
Quote
Edited 7:38am May 30, 2015 7:25am | Edited 7:38am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting PipMeUp

Disliked

{quote} Ah ok. English language tends to be too weak sometimes. Like the word "random" for which I found more than 10 different translations in French (!). Do I understand correctly? 1. Curve fitting bias is about the "degree of freedom" (complexity) of your model (and is what I understood indeed) => like the order of a polynomial to fit a set of points. 2. Data mining bias is about the relevance of the variables => like using the level of the sea in Venice to estimate the price of the bread in London.

Ignored

I think there might still be some confusion. It's probably easier to understand biases when you put them in question form. Here are some hopefully better explanations:

curve fitting bias = does my system contain rules that describe some general market characteristics?

This question cannot be answered because we don't know what these "general market characteristics are", since we don't know the real underlying market model we cannot know how close or far away we are from it. However the more data we use (long data sets and more symbols) we are definitely going to reduce the curve fitting bias (as the more data, the more we should approach the "real market model").

data mining bias = could the trading performance from the rules found by my system be a mere consequence of the mining process?

This question can be answered as the number of systems generated by the mining process is always equal to the number of systems due to spurious relationships plus the number of systems due to real relationships. The number of systems expected to be from spurious relationships can be estimated in several ways. One example is by carrying out the mining process on many random data sets that are fabricated using bootstrapping with replacement from the original time series. This preserves all time series characteristics (like the distribution of returns) except any real relationships between past and future bars. We can then build a hypothesis test using the distribution of systems from the random data sets and compare it with the real distribution. If we expect 1 system on average from spurious relationships within some statistical performance group and we get 100 systems on the real data then we could say that the probability to come from chance is 1% (100*1/100).

I hope this answers it better. I am sorry if I am not the best writer!

Take into account that the above are my definitions of curve fitting bias and data mining bias, some people use the terms interchangeably or the other way around

For the sake of keeping this discussion clear I will address them as stated above.

Post #523
Quote
May 30, 2015 7:25am May 30, 2015 7:25am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting fajst_k

Disliked

Just to make it even more simple what i understand curve fitting bias - if you have just 3 historical data points you can always fit some model to it but it will not generalize well as it catched pattern just form those points nothing more data mining bias - you can fit a lot of models to those 3 points and one by luck will generalize well for a while due to luck. Like a flipping the coin, if you try enough times you can always get serie of ten heads but still this result was just luck. So AlgoTradeJo. Do you measure data mining bias or not finally...

Ignored

Yes, I measure data mining bias. Yes, I have explained the procedure several times already within the thread.

Post #524
Quote
May 30, 2015 7:32am May 30, 2015 7:32am

algoTraderJo
Joined Dec 2014 | Status: Member | 412 Posts

Quoting jcl366

Disliked

{quote} Yes. What I meant was that the MAE/MFE data are taken over the hours of a business day with the parameters (A ~ 9, D ~ 8) tested by PipMeUP. {quote} I would generally agree, but disagree in this case. You're right when you use data mining just for fiddling with arbitrary parameters until you got the best system. But you can also use it for identifying a certain, explainable market inefficiency. A system must be based anyway on an inefficiency in the real world or it would not work in real trading. If you have identified it, you can limit...

Ignored

In my experience this is a mistake. Reducing the parameter space from the original mining process by any sort of excuse leads to an underestimation of the mining bias. At least every time I have done this I have had really disastrous results. But this is simply from my experience, perhaps you will make it work.

For measuring the bias. I create an average distribution of the performance metric I use for system selection (for example the Sharpe ratio) constructed from all the mining procedures done on random data sets (I never use less than 200 sets and when doing genetics I do the mining at least 5 times per set). The random data sets are created using boostrapping with replacement, I do not do shuffling and detrending. In my experience detrending is a bad idea because the probability to perform well due to chance can often be derived from drifts within the data. For example if you mine systems on the S&P500 you get a lot of profitable results out of randomness due to the natural tendency of the index to go up. Detrending is IMHO a mistake, as you eliminate this effect.

There are many ways to do this test, from the days of White's reality check you'll find many similar techniques.

Post #525
Quote
Edited Jun 1, 2015 2:21pm May 31, 2015 8:11pm | Edited Jun 1, 2015 2:21pm

babelproofre
| Joined Oct 2009 | Status: Member | 23 Posts

PipMeUp asked

Quote

Disliked

Which basic null hypothesis can you offer us? What makes you believe the H0 itself is relevant?

Well, the two offered by Aronson in his book are:

The expected return from a "best performing system" equals zero
The "best performing system" is devoid of predictive power, i.e. is randomly correlated with future market behaviour

These are different H0s and the sampling distribution for each is created differently. In 1) above the market returns are detrended and the system returns on this detrended stream of returns are adjusted to be centred over zero, whilst in 2) the test is applied without any adjustment in the returns. The reason for the difference is simply that the sampling distribution of the H0 must reflect the null hypothesis being tested, and 1) and 2) are testing two distinctly different null hypotheses. In this regard I disagree with algoTraderJo: whether to detrend or not is a function of the test being applied and is not intrinsically good or bad, except where the null hypothesis requires detrending and it is not done, or in some cases vice versa.

Similar to the argument about detrending or not, there is also the choice of randomisation method, which could any of:

stationary bootstrap
block bootstrap
random permutation
Markov chain

which will again depend on the null hypothesis being tested.

The whole point about these different choices of tests is that there is not a single, universally applicable test for data mining bias that must be passed, but rather the test(s) to be applied is(are) dependent on the system and how it was arrived at.

Edit: from a comment on my blog, an interesting link http://www.timothymasters.info/Permutation_tests.html

Post #526
Quote
Jun 1, 2015 5:46am Jun 1, 2015 5:46am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting dreamvalley

Disliked

Hi everyone. I have been following this thread for a while and would like to share my findings. Hopefully this will make things more useful and consistent. I have attached some files. The first one is the E/U time series pre-processed from what PipMeUp uploaded to correct the time to GMT+1/+2 using DST. I have checked that the price agrees more or less (+/- 1 pip) with the trading records uploaded by algoTraderJo. I also uploaded a Matlab script test_params.m that computes MLE and MSE with parameters A = 9, B = 110, C = 4, D = 8 to compare with...

Ignored

I don't get the same thing as you do. But I think the difference comes from the lines 39-40 in your script:

Inserted Code

mle(j) = hi - op;
mse(j) = op - lo;

I use the % return:

Inserted Code

mle(j) = (hi - op)/op;
mse(j) = (op - lo)/op;

I'm happy the see that you also get negative values in your forecasts.

Attached File(s)

pipmeup_dukascopy_results_9_110_4_8.txt 120 KB | 212 downloads

pipmeup_fxcm_results_9_110_4_8.txt 136 KB | 211 downloads

No greed. No fear. Just maths.

Post #527
Quote
Edited 3:43pm Jun 1, 2015 3:29pm | Edited 3:43pm

dreamvalley
| Joined Jul 2014 | Status: Member | 8 Posts

Quoting PipMeUp

Disliked

{quote} I don't get the same thing as you do. But I think the difference comes from the lines 39-40 in your script: mle(j) = hi - op; mse(j) = op - lo; I use the % return: mle(j) = (hi - op)/op; mse(j) = (op - lo)/op; I'm happy the see that you also get negative values in your forecasts. {file} {file}

Ignored

Thanks PipMeUp for pointing out the difference. I have fixed these 2 lines and run the results again. Also I was using DST in the US in my last file, when in fact I should be using the DST in EU, and some double rounding error caused some lines to be missing in the H1 data. This has also been fixed in the attached file. Now my results for MLE and MSE agrees well within 1e-5 of your FXCM results most of the time. This shows that our implementations are most likely correct!

I also notice that in your FXCM results, the line for "01/08/14 09:00" was missing compared to my results. Otherwise our set of timestamps agree perfectly. I think this timestamp was in the "eurusd-h1.txt" file you uploaded, which I used, and presumably coming from FXCM as well. Not sure why this line was missing in your FXCM results?

Also given that the difference in data providers can lead to a somewhat big difference in the final asset curve produced, This makes me worry that the results using this linear learning approach is not very stable. AlgoTraderJo, what do you think about this?

Attached File(s)

findings20150601.zip 890 KB | 205 downloads

Post #528
Quote
Jun 1, 2015 3:57pm Jun 1, 2015 3:57pm

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting dreamvalley

Disliked

I was using DST in the US in my last file, when in fact I should be using the DST in EU

Ignored

That's why I prefer using the name of the timezone "Europe/Paris". The library (joda time) does the rest.

Quoting dreamvalley

Disliked

I also notice that in your FXCM results, the line for "01/08/14 09:00" was missing compared to my results. Otherwise our set of timestamps agree perfectly.

Ignored

Quoting dreamvalley

Disliked

Also given that the difference in data providers can lead to a somewhat big difference in the final asset curve produced, This makes me worry that the results using this linear learning approach is not very stable. AlgoTraderJo, what do you think about this?

Ignored

The trading fees models are different. FXCM adds a premium on top of the spread with no commission. Dukascopy adds no premium to the spread but charges a commissiom when the trade is open and another when it is closed. These commissions aren't reflected in the tick data, you can only see a small spread. I think that after the commissions are deducted the results are roughly the same.

No greed. No fear. Just maths.

Post #529
Quote
Jun 3, 2015 8:32pm Jun 3, 2015 8:32pm

FXEZ
Joined Jan 2007 | Status: developing... | 974 Posts

Quoting algoTraderJo

Disliked

{quote} curve fitting bias = does my system contain rules that describe some general market characteristics?

data mining bias = could the trading performance from the rules found by my system be a mere consequence of the mining process?

Ignored

Thanks for this great summary. It makes a lot of sense. Can this be thought of in terms of the bias/variance tradeoff as well? It seems that the optimization process naturally tends to produce low bias (high performing) systems. Additionally the data mining bias test filters out high bias and probably a lot of medium bias systems, leaving us with mostly low bias (high performance) systems according to our fitness metric / randomly reshuffled data tests.

So the problem of model generalization essentially boils down to a variance reduction exercise. Variance reduction techniques such as using more data, and other more involved variance reduction techniques should be effective in improving model generalization. When the variance is reduced, a model should generalize better on future data, particularly if the bias is not increased in this variance reduction process.

Is this how you see it also? To me the problem of model generalization seems to be the most important question and ultimate goal of model building, as this is the hard right edge of trading profits / losses.

Post #530
Quote
Jun 3, 2015 9:19pm Jun 3, 2015 9:19pm

gururise
| Joined May 2015 | Status: Junior Member | 2 Posts

Hello Guys,

Been following this thread. I want to thank AlgoTraderJo, and the rest of you for the very insightful comments. I've re-implemented AlgoTraderJo's initial algorithms (using historical ups&downs to predict tomorrow movement) with limited success in stock market; however, I know this thread is dedicated to Forex, so my question is:

Do you think instead of using MLE and MSE as the outputs, perhaps using the ZigZag indicator would produce better results? The ZigZag could be a -1 for buy, 0 for hold and a 1 for sell.

Post #531
Quote
Jun 4, 2015 3:40am Jun 4, 2015 3:40am

PipMeUp
Joined Aug 2011 | Status: Member | 1,326 Posts

Quoting FXEZ

Disliked

{quote} Thanks for this great summary. It makes a lot of sense. Can this be thought of in terms of the bias/variance tradeoff as well? It seems that the optimization process naturally tends to produce low bias (high performing) systems. Additionally the data mining bias test filters out high bias and probably a lot of medium bias systems, leaving us with mostly low bias (high performance) systems according to our fitness metric / randomly reshuffled data tests. So the problem of model...

Ignored

To me it still doesn't make a lot of sense. I also thought it was about the bias/variance tradeoff but algoTraderJo replied me it wasn't.
I thought it was about the determination and relevance of the independent variables but algoTraderJo replied me it wasn't.
And if it is not the bias, not the variance and not the relevance of the inputs I have no idea what it is.

Especially this sentense "data mining bias = could the trading performance from the rules found by my system be a mere consequence of the mining process?" is strange to me. After all the meta parameters A, B, C and D are found by taking the best set among all the possible values (or approximately using GA) using a metric like Sharpe or Sortino. This means that the parameters are choosen by using all the available dataset. The value of the vector (A,B,C,D) used for the backtest is known in 1991 thanks to data available in 2015!

No greed. No fear. Just maths.

Post #532
Quote
Edited 5:03pm Jun 8, 2015 12:12pm | Edited 5:03pm

babelproofre
| Joined Oct 2009 | Status: Member | 23 Posts

An interesting online article here: http://arstechnica.co.uk/science/201...correlation/1/; substitute data mining bias for correlation it then shows how difficult things can be when trying to account for it.

Post #533
Quote
Jun 8, 2015 4:36pm Jun 8, 2015 4:36pm

gururise
| Joined May 2015 | Status: Junior Member | 2 Posts

It seems almost any algorithm one comes up with utilizing historical data is prone to data mining bias. One can never truly mitigate this problem as far as I can tell when utilizing historical data. Even if one cannot directly measure data-mining bias, you can get an idea of how much it affects the results: I think utilizing data-mining methods to come up with an algorithm that turns out to be robust and tends to make money regardless of parameter settings would be ideal, as this would tend to infer that the algorithm is capable of generalization and may have learned to exploit recurring trends in the data. An algorithm that is brittle and highly dependent upon optimized parameter values would infer a high degree of data-mining bias.

Now, the question is what kind of results are people getting with AlgotraderJo's latest algorithm? Is the algorithm highly dependent upon optimized parameter values for (A,B,C,D)? I think having an optimized parameter for A makes sense. Unfortunately, I only have forex data back to 2011, so I am unable to run the full backtest.

Post #534
Quote
Jun 9, 2015 6:05am Jun 9, 2015 6:05am

jcl366
| Joined May 2015 | Status: Member | 4 Posts

In fact you can sort of directly measure data mining bias. There are several methods, one of them is White's reality test that you can google.

Unfortunately, all those methods require running a lot of systems for generating a distribution. Especially with machine learning systems, this takes a lot of time, and makes measuring data mining bias a very slow procedure.

Post #535
Quote
Edited Jun 10, 2015 6:04pm Jun 9, 2015 7:16pm | Edited Jun 10, 2015 6:04pm

dfreeze
| Joined May 2015 | Status: Member | 11 Posts

Kind of a random interjection here, but Jonathan Kinlay has done some interesting work with Hidden Markov Models. HMMs have some deep theory that makes them really robust compared to ANNs. His results look nice.

On Kinlay's blog:

Regime-Switching & Market State Modeling
A Practical Application of Regime Switching Models to Pairs Trading

other:

Hidden Markov Models for Dummies

Post #536
Quote
Jun 10, 2015 4:18am Jun 10, 2015 4:18am

ampleparking
| Joined May 2015 | Status: Member | 16 Posts

Quoting algoTraderJo

Disliked

On each day, at a given trading hour (A) build a machine learning model using linear regression. To train the model look at the past (B) days at the same trading hour, then look at the past (C) hourly bar returns as inputs and use the maximum long excursion (MLE) and maximum short excursion (MSE) for the (D) bars after the hour signal as target (it is crucial to train for both targets at the same time). The past (C) bar returns are the inputs, the MSE/MLE with a horizon (D) are the targets and the number of examples trained is B. As I have always...

Ignored

I have some questions about the "borders":

When I look at the "past (C) hourly bar returns", is the return between the last "full candle" open price and the "current, incomplete candle" open price included? I think so.
When I look at the "(D) bars after the hour signal" (the MSE/MLE horizon), do I consider the high/low prices of the "current candle" for calculations? Is the "current candle" inside the horizon? I think so.

Thanks.

Post #537
Quote
Jun 10, 2015 12:51pm Jun 10, 2015 12:51pm

KaBo
| Joined Dec 2014 | Status: Member | 32 Posts

Quoting ampleparking

Disliked

{quote} I have some questions about the "borders": When I look at the "past (C) hourly bar returns", is the return between the last "full candle" open price and the "current, incomplete candle" open price included? I think so. When I look at the "(D) bars after the hour signal" (the MSE/MLE horizon), do I consider the high/low prices of the "current candle" for calculations? Is the "current candle" inside the horizon? I think so. Thanks.

Ignored

yes and yes.
You always take the trading decision at the open of a candle so the rest of the candle is still ahead of you, including high and low which are included in the projected horizon.

Post #538
Quote
Jun 10, 2015 3:25pm Jun 10, 2015 3:25pm

KaBo
| Joined Dec 2014 | Status: Member | 32 Posts

Quoting algoTraderJo

Disliked

This basic technique is already a generic mechanism to create profitable machine learnings algos across almost the entire forex space. Of course, this is of little use without the mining bias assessment (which is veeery critical) but the map of how to get to the final strategies for live trading should now be more than clear to those who listen... Or is it not?

Ignored

Yes, and I am getting very exciting results by now (LinReg -> NN). They are so good to the point that it is hard to believe they are for real - I will double and triple check as we have all been down that road before ;-)

I did a lot of testing. It does seem to make a difference whether to take one target (mle - mse) or take two targets (mle & mse) in the calculation. The results seem better when taking two targets. I don't know why, but it looks like it.

The returns of bars seem to be the best inputs. I always thought that indicator values, slopes like LinReg or MAs, HHV/LLV relations, etc. should be much better inputs, but the best results I still get with bars returns. Even when normalizing and doing all kinds of conversions, the plain returns of bars still work best for me right now.

A question about the bootstrapping with replacement. Is that simply taking all the bars return (open to open) and switching them randomly around? Basically shaking the 100.000 bars keeping the beginning and end point? Or is there more logic behind it? E.g. keep the switching within some time constraint (month/year).
Right now I figure that shuffeling the return serves for the purpose.

Post #539
Quote
Jun 10, 2015 3:32pm Jun 10, 2015 3:32pm

surfeur
| Joined Jan 2008 | Status: Member | 194 Posts

Quoting dfreeze

Disliked

Kind of a random interjection here, but Jonathan Kinlay has done some interesting work with Hidden Markov Models. HMMs have some deep theory that makes them really robust compared to ANNs and regressions. His results look nice. On Kinlay's blog: Regime-Switching & Market State Modeling A Practical...

Ignored

You think that HMM are good for non linear time series ?

Post #540
Quote
Jun 10, 2015 4:22pm Jun 10, 2015 4:22pm

Neilg
| Joined Sep 2013 | Status: Member | 4 Posts

Enjoying this thread very much.

algoTraderJo,

The majority of your time has been spent explaining/showing ML on daily charts. Do you have any successful strategies/systems on M1, M5, M15 or M30 charts? What are the differences involved in creating lower timeframe ML systems?

Trading Discussion
/
Machine Learning with algoTraderJo
Reply to Thread
- 1 25 26 Page 27 28 29 47
- 1 26 Page 27 28 47

0 traders viewing now

Options

Similar Threads

Machine Learning with algoTraderJo