best SQL schema for tick data

Post #1
Quote
First Post: Oct 12, 2010 11:11pm Oct 12, 2010 11:11pm

neomn
| Joined Jan 2010 | Status: Member | 86 Posts

I am planing to dump the tick data from dukascopy into mssql database for offline statistical analysis.

I have heard people saying that relational database is not best suited for the time series data, but it is easy to do different query with some sql script, which save you time for writing code to analyze binary or csv data. So I decide to go the database way.

The data from dukascopy is:
time; bid; bid volume; ask; ask volume

so I am assuming naturally the sql table structure is:
time(datetime); double bid(decimal); bid volume(int); ask(decimal); ask volume(int)

for these who has done this, please advise.

Post #2
Quote
Oct 13, 2010 1:32am Oct 13, 2010 1:32am

Ronald Raygun
Joined Jul 2007 | Status: 33 y/o Investor/Trader/Programmer | 5,016 Posts

Well if your database isn't constantly expanding, it doesn't really matter what schema you use. I would be more worried about getting the queries working first then focusing on optimizing the database.

Post #3
Quote
Oct 13, 2010 8:12am Oct 13, 2010 8:12am

neomn
| Joined Jan 2010 | Status: Member | 86 Posts

Thanks for responding Ron. How is trading doing?

I am actually considering break up dateTime column into date and time, so it is flexible to query. I was wondering if anyone has done it this way.
The advantage of doing this is you can use date as a foreign key criteria and compare price of different date same time period with less query string.

Post #4
Quote
Oct 15, 2010 5:02pm Oct 15, 2010 5:02pm

redbaron1981
Joined Jul 2009 | Status: Member | 345 Posts

Just an idea.

First of all I am very new to this kind of stuff but I'll mention a few things here that may help you along the way.

http://csved.sjfrancke.nl/

This is a really handy .csv editor if you need to do any editing on your data before importing it. Whats quite good about this is that you can add characters in to already existing data. For example forex tester data has no separators between Hours Min's Seconds, and hardly any programs recognize it like that, but with csved you can add the separators.

Secondly and I'm still working on this but you can use a data miner to extract the time relevant data. These 2 are real easy to use just look at tutorials on you tube.

http://www.knime.org/

http://rapid-i.com/

Also I believe(haven't tried it yet) but I believe you can export to R project.

http://www.r-project.org/

Which is a real powerful statistical analysis tool.

The best thing with the last three mentioned software names is that there open source and free to use! : )

http://www.forexfactory.com/showthread.php?t=167720

The thread above was started by Mikkon who's also using SQL to interrogate a database.

Good luck, hope this helps.

James.

Post #5
Quote
Oct 15, 2010 6:38pm Oct 15, 2010 6:38pm

FrinkFX
| Joined May 2009 | Status: Member | 251 Posts

I'm doing the exact same thing...
The dukascopy data seems very good but would take ages to download as the files are split by hour.

I'm using MySQL as my database but here's my SQL code:

Inserted Code

CREATE TABLE `tick`
(
`id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`providerId` INT UNSIGNED NOT NULL DEFAULT 0,
`pairId` INT UNSIGNED NOT NULL,
`time` DATETIME NOT NULL,
`msec` INT UNSIGNED NOT NULL DEFAULT 0, -- used with high-quality tick feeds (eg. Dukascopy)
`bid` INT UNSIGNED NOT NULL,
`ask` INT UNSIGNED NOT NULL,
`gcId` BIGINT UNSIGNED UNIQUE, -- GAIN Capital's ID, unique to each tick (used for reference)

INDEX(`pairId`,`time`,`msec`),
INDEX(`time`,`msec`)
);

I don't yet have the BidVolume and AskVolume but I'll probably add them too. I use 'providerID' to denote the data soruce: dukascopy/Gain Capital/Alpari ... etc.

Hoppe this helps

Post #6
Quote
Oct 16, 2010 5:50am Oct 16, 2010 5:50am

pippero
| Joined Oct 2007 | Status: Trend following is for teens | 216 Posts

relational databases are a poor choice for timeseries data, and doubly so for high frequency tick data. I am logging 100000+ ticks on EURUSD _alone_ _per_ _day_. If you're tracking 10 pairs you're looking at a million rows per day. I doubt you can support that on mssql (or even oracle) on consumer grade hardware..

Post #7
Quote
Oct 16, 2010 11:21am Oct 16, 2010 11:21am

neomn
| Joined Jan 2010 | Status: Member | 86 Posts

Thanks James for the info about the csv editor, i will experiment and break down to datetime column into day, hour, minutes column. Some of these will be foreign keys so you can query the pattern against hour, minutes, etc of each day.

will look further into knime and R. Seem like very good tool for analyze data.

Quoting redbaron1981

Disliked

Just an idea.

First of all I am very new to this kind of stuff but I'll mention a few things here that may help you along the way.

http://csved.sjfrancke.nl/

This is a really handy .csv editor if you need to do any editing on your data before importing it. Whats quite good about this is that you can add characters in to already existing data. For example forex tester data has no separators between Hours Min's Seconds, and hardly any programs recognize it like that, but with csved you can add the separators.

Secondly and I'm still...

Ignored

Post #8
Quote
Oct 16, 2010 11:25am Oct 16, 2010 11:25am

neomn
| Joined Jan 2010 | Status: Member | 86 Posts

Frinks,
Bid volume and Ask volume is sometime good for momentum analysis. You can concatenate the individual tick csv into one big file.
Thanks for the schema by the way.

Quoting FrinkFX

Disliked

I don't yet have the BidVolume and AskVolum.e but I'll probably add them too. I use 'providerID' to denote the data soruce: dukascopy/Gain Capital/Alpari ... etc.

Hoppe this helps

Ignored

Post #9
Quote
Oct 16, 2010 12:25pm Oct 16, 2010 12:25pm

7bit
Joined Mar 2009 | Status: Member | 1,231 Posts

Quoting pippero

Disliked

If you're tracking 10 pairs you're looking at a million rows per day. I doubt you can support that on mssql (or even oracle) on consumer grade hardware..

Ignored

Maybe smaller is better. MySQL will run on "consumer grade" hardware and is said to be amongst the fastest known databases when it comes to read only queries. I would not install something of the caliber of Oracle or a comparably huge monster database with thousands of features you won't ever need for a task like this.

Post #10
Quote
Oct 16, 2010 2:50pm Oct 16, 2010 2:50pm

Adal
Joined Mar 2009 | Status: Member | 770 Posts

There are many discussion about tick storage on ET:
http://www.elitetrader.com/vb/showth...threadid=81345
http://www.elitetrader.com/vb/showth...hreadid=201252
http://www.google.com/search?q=tick%...litetrader.com

I use compressed binary files for tick data storage, since I have over 4 billion ticks - all available instrument from Dukascopy, GAIN, dbFX, FXCM and TrueFX.

I used SQLite first, then PostgreSQL, but they just don't scale to this amount of tick data (at least on high-end desktop hardware).

Post #11
Quote
Oct 16, 2010 3:29pm Oct 16, 2010 3:29pm

pippero
| Joined Oct 2007 | Status: Trend following is for teens | 216 Posts

Quoting 7bit

Disliked

Maybe smaller is better. MySQL will run on "consumer grade" hardware and is said to be amongst the fastest known databases when it comes to read only queries. I would not install something of the caliber of Oracle or a comparably huge monster database with thousands of features you won't ever need for a task like this.

Ignored

Sure, as long as you dont plan on logging more than a few instruments for a few months.

Post #12
Quote
Oct 17, 2010 12:20pm Oct 17, 2010 12:20pm

7bit
Joined Mar 2009 | Status: Member | 1,231 Posts

Quoting pippero

Disliked

Sure, as long as you dont plan on logging more than a few instruments for a few months.

Ignored

How can you query your huge monster CSV files faster than a DB server could do it with its highly optimized storage and query engine?

Post #13
Quote
Oct 17, 2010 2:09pm Oct 17, 2010 2:09pm

redbaron1981
Joined Jul 2009 | Status: Member | 345 Posts

Hey Guys,

In my above post I mentioned my idea of breaking the data down with a data miner to separate the time groups I was looking for. This is actually quite an easy task and I can export the results to .csv or R for analysis.

I want to finish with a database where I can extract information such as average hourly ranges, when statistically that the High or low of the day was made etc etc a little bit like these charts on Mikkoms thread.

http://www.forexfactory.com/showthre...17#post2720917

I know that I can extract the time/date data I want quite easily however my problem would be from this point trying to extract the exact values for high lows etc to be able to generate a chart.

Do you guys think that it would be better for me to use Mysql for Db interrogating rather than a data miner and would it then be easier for me to extract what was needed rather than using a script to extract it?

Your advice would be much appreciated.

Post #14
Quote
Oct 18, 2010 7:59am Oct 18, 2010 7:59am

FrinkFX
| Joined May 2009 | Status: Member | 251 Posts

Quoting neomn

Disliked

Frinks,
Bid volume and Ask volume is sometime good for momentum analysis. You can concatenate the individual tick csv into one big file.
Thanks for the schema by the way.

Ignored

No problem. The reason I have the bid/ask as unsigned integer BTW is because my system avoids floating-point arithmetic as much as possible and does every thing in pips and pennies. This provides a huge boost (I hope)

To doubt I'll include bid volume and ask volume once I've done some work on the system and got some sort of stable version of it up-and-running.

Post #15
Quote
Nov 3, 2010 5:12pm Nov 3, 2010 5:12pm

letsarb
| Joined Apr 2010 | Status: Member | 26 Posts

To visit this website could be a good idea. It works perfectly.

http://eareview.net/tick-data

Cheers...

Post #16
Quote
Nov 7, 2010 5:17am Nov 7, 2010 5:17am

pippero
| Joined Oct 2007 | Status: Trend following is for teens | 216 Posts

Quoting 7bit

Disliked

How can you query your huge monster CSV files faster than a DB server could do it with its highly optimized storage and query engine?

Ignored

Where do you think a DB stores its data, all in RAM? No, it's on disk like a csv. A simple file and directory naming scheme will go a long way.

Let me remind you that tick data is generally strictly sequential and immutable -- it's not like you need to be able to do DELETEs and UPDATEs on tick data that you logged yesterday. and when you need to do a SELECT, you usually need to select a bunch of "records" that are in sequential order.

the only thing faster than a flat file is a binary flat file.. Don't use relational dbs when there are no relations to keep track of.

Post #17
Quote
Nov 7, 2010 2:10pm Nov 7, 2010 2:10pm

pippero
| Joined Oct 2007 | Status: Trend following is for teens | 216 Posts

but dont take my word for it -- try theirs: http://www.puppetmastertrading.com/b...rt-2/#more-280

some more appropriate dbs for columnar tick data:

http://www.infobright.org/ $$$
http://monetdb.cwi.nl/ opensource
http://kx.com/ $$$

and file formats:

http://en.wikipedia.org/wiki/Hierarchical_Data_Format

Post #18
Quote
Feb 16, 2011 11:52pm Feb 16, 2011 11:52pm

noyou
| Joined Feb 2010 | Status: Member | 5 Posts

Quoting pippero

Disliked

but dont take my word for it -- try theirs: http://www.puppetmastertrading.com/b...rt-2/#more-280

some more appropriate dbs for columnar tick data:

http://www.infobright.org/ $$$
http://monetdb.cwi.nl/ opensource
http://kx.com/ $$$

and file formats:

http://en.wikipedia.org/wiki/Hierarchical_Data_Format

Ignored

Hi, Pippero,

Is there a step-by-step instruction to build those HDF5 files from ground up?

Thank you in advance!

Post #19
Quote
Last Post: Feb 17, 2011 12:09am Feb 17, 2011 12:09am

Kenz987
| Joined Aug 2006 | Status: Member | 737 Posts

I keep all my data in Ninja Trader ver 6 database, which is MS Access. (Ver 7 has been upgraded to SQL). It's not a huge amount. NT has a good built-in Strategy Tester.

Platform Tech
/
best SQL schema for tick data
Reply to Thread

0 traders viewing now

Options

Similar Threads

best SQL schema for tick data