Relation between binary and continuous variables? (In MatLaB)

#1
First Post: Feb 14, 2011 4:07am Feb 14, 2011 4:07am

mfurlend
| Joined Apr 2010 | Status: Trader | 165 Posts

I have an EA that I regularly backtest and optimize. I want to determine a relationship between the probability of a given trade being profitable and a certain statistical measure called kurtosis at the time the trade was initiated.

I programmed my EA to generate a .csv file containing each trade's profit/loss alongside the proper kurtosis value.

Excerpt:

Inserted Code

[b][u]PROFIT[/u][/b][b][u]KURTOSIS[/u][/b]
-41.822.97530011
-61.823.12553938
-45.822.32907776
-100.821.9039837
-14.822.35415757
-30.822.23130752
-75.822.76931501
-62.821.60114237
28.183.29114443
12.182.99822995
6.182.27906841
-76.821.59931087
1.186.21691918
54.183.15778944
24.184.98645291
2.182.84275644
33.182.36483006
18.182.17924754
-160.821.86165136
51.181.90035084
...
etc

I am not interested in the discrete sum of profit/loss, so I replaced all p/l values with binary values: 1 if the trade was profitable and 0 if it was not profitable.

Excerpt:

Inserted Code

[b][u]PROFIT[/u][/b][b][u]KURTOSIS[/u][/b]
02.97530011
03.12553938
02.32907776
01.9039837
02.35415757
02.23130752
02.76931501
01.60114237
13.29114443
12.99822995
12.27906841
01.59931087
16.21691918
13.15778944
14.98645291
12.84275644
12.36483006
12.17924754
01.86165136
11.90035084

Now what I want to do is to determine the relationship (if any).
Normally I would use a polynomial regression, but this makes no sense when one of the variables is binary.

I have read of something called logistic regression, but I can't seem to figure out how to do it with MatLab.

If someone could give me some instructions and possibly some tips on interpreting the results I would really appreciate it. If you know of another way to achieve a relation between the two variables I'd like to hear that too.

#2
Edited 11:45pm Feb 15, 2011 8:31pm | Edited 11:45pm

OldQuant
| Joined Jan 2011 | Status: Trader | 2,078 Posts

Quoting mfurlend

Disliked

I have an EA that I regularly backtest and optimize. I want to determine a relationship between the probability of a given trade being profitable and a certain statistical measure called kurtosis at the time the trade was initiated.

Ignored

A couple of things, off the top of my head, with no warranty of usefulness.

What you have done, in effect, is transformed your problem into the comparison of two samples: the kurtosis measures (which I'll shorthand as m4 for reason I'm sure you know) for the "0" group and the "1" group.

Now you have an expectation that the samples - and so, the m4_0 and m4_1 populations as well - are different (whether certain levels of m4 correlate with, or are cointegrated with, profits or losses is probably not your concern).

So, you have a classic comparison of the properties of two populations problem. Lots of stat books, and well as R, (MatLab I assume does; I don't use it), stata, et. al, have those kind of tests cook-booked.

The one thing you might want to look up is the standard error of the sample kurtosis for normal. I know that Kendall and Stewart gave the standard error calcs for all normal moments and cumulants in their classic "Advanced Theory of Statistics"

(BTW: they didn't mean to use 'advanced' in the sense of complicated or high-falut'n. Rather, as proper Englishmen, they used it is the sense of "this has what has been brought forth to us." Should have called it the "Received" Theory of Statistics.

Anyway, this is kind of an interesting idea. I have no a priori as to what you're going to find, and will be interested in anything you post.

#3
Edited 9:49pm Feb 15, 2011 9:09pm | Edited 9:49pm

jamjamjam
| Joined Apr 2010 | Status: Trader | 96 Posts

This is a typical 'binary' regression problem; aside from logistic regression, there are dozens of machine learning type of regressions you could apply.
It's common to transform the input data using some type of normalization constraint (val->stdev for ex).

A simple example can be run in R:

http://psychweb.psy.umt.edu/denis/da...c_R/index.html

#4
Feb 15, 2011 10:05pm Feb 15, 2011 10:05pm

mbkennel
| Joined Nov 2009 | Status: Trader | 245 Posts

Yes, logistic regression is an option. what you want. The underlying model for logistic regression is that the logarithm of the odds, i.e. log(prob(true)/prob(false) is a linear function of the inputs. It is the simplest generalization of linear regression to binary targets.

But if you are trying to see if there is ANY statistically significant relationship, then what you want may be even simpler.

Try a t-test, or a Mann-Whitney test on the continuous values, split by profitable or not profitable.

If you don't get any statistically significant difference, then the value of the continuous variable is not likely to be useful (on its own) in predicting the binary tag (1/0).

How many observations do you have?

Have you tried something really simple? Look at the profit---is there any statistically significant correlation between the input and the profitability? Try ordinary Pearson correlation and Spearman rank correlation (doesn't assume Gaussianity).

Think first, then compute.

glmfit in Statistics Toolbox in MATLAB will do logistic regression (and other stuff too).

#5
Last Post: Feb 16, 2011 10:14am Feb 16, 2011 10:14am

mfurlend
| Joined Apr 2010 | Status: Trader | 165 Posts

Quote

Disliked

How many observations do you have?

I intend to run this process in 3 dimensions, with the 3rd dimension being different optimizations of another variable. This will, of course, yield results with different amounts of trades - so anywhere between 10,000 and 500.

Here is a similar process that I applied to 3 continuous values; # bars, kstd, and profit factor. Each point is a different optimization result.

Initial scatterplot:
http://furlender.com/forex/scatterplot.avi

Fitted with polynomial regression:
http://furlender.com/forex/polyfit.avi

I want to achieve something similar to this, except with one of the values being binary.

I will definitely try the statistical tests you mentioned first, and if a relationship is found it looks like the way to go for further information is logistic regression.

Thanks everyone for your answers!

P.S: OldQuant, when I'm done I'll post the results since you're interested

Trading Discussion
/
Relation between binary and continuous variables? (In MatLaB)
Reply to Thread

0 traders viewing now

Options

Similar Threads

Relation between binary and continuous variables? (In MatLaB)