• Home
  • Forums
  • Trades
  • News
  • Calendar
  • Market
  • Brokers
  • Login
  • Join
  • User/Email: Password:
  • 12:09pm
Menu
  • Forums
  • Trades
  • News
  • Calendar
  • Market
  • Brokers
  • Login
  • Join
  • 12:09pm
Sister Sites
  • Metals Mine
  • Energy EXCH
  • Crypto Craft

Options

Bookmark Thread

First Page First Unread Last Page Last Post

Printable Version

Similar Threads

Tick Size, Tick Rate, Tick Spread, and Tick Logger indicators 36 replies

Where can I find tick-by-tick data for various currency pairs? 9 replies

What platform has tick by tick (multiple chart filling) data replay? 2 replies

download data tick by tick ??? help 1 reply

  • Platform Tech
  • /
  • Reply to Thread
  • Subscribe

best SQL schema for tick data

  • Post #1
  • Quote
  • First Post: Oct 12, 2010 11:11pm Oct 12, 2010 11:11pm
  •  neomn
  • | Joined Jan 2010 | Status: Member | 86 Posts
I am planing to dump the tick data from dukascopy into mssql database for offline statistical analysis.

I have heard people saying that relational database is not best suited for the time series data, but it is easy to do different query with some sql script, which save you time for writing code to analyze binary or csv data. So I decide to go the database way.

The data from dukascopy is:
time; bid; bid volume; ask; ask volume

so I am assuming naturally the sql table structure is:
time(datetime); double bid(decimal); bid volume(int); ask(decimal); ask volume(int)

for these who has done this, please advise.
  • Post #2
  • Quote
  • Oct 13, 2010 1:32am Oct 13, 2010 1:32am
  •  Ronald Raygun
  • Joined Jul 2007 | Status: 28 y/o Investor/Trader/Programmer | 5,016 Posts
Well if your database isn't constantly expanding, it doesn't really matter what schema you use. I would be more worried about getting the queries working first then focusing on optimizing the database.
  • Post #3
  • Quote
  • Oct 13, 2010 8:12am Oct 13, 2010 8:12am
  •  neomn
  • | Joined Jan 2010 | Status: Member | 86 Posts
Thanks for responding Ron. How is trading doing?

I am actually considering break up dateTime column into date and time, so it is flexible to query. I was wondering if anyone has done it this way.
The advantage of doing this is you can use date as a foreign key criteria and compare price of different date same time period with less query string.
  • Post #4
  • Quote
  • Oct 15, 2010 5:02pm Oct 15, 2010 5:02pm
  •  redbaron1981
  • Joined Jul 2009 | Status: Member | 329 Posts
Just an idea.

First of all I am very new to this kind of stuff but I'll mention a few things here that may help you along the way.

http://csved.sjfrancke.nl/

This is a really handy .csv editor if you need to do any editing on your data before importing it. Whats quite good about this is that you can add characters in to already existing data. For example forex tester data has no separators between Hours Min's Seconds, and hardly any programs recognize it like that, but with csved you can add the separators.

Secondly and I'm still working on this but you can use a data miner to extract the time relevant data. These 2 are real easy to use just look at tutorials on you tube.

http://www.knime.org/

http://rapid-i.com/

Also I believe(haven't tried it yet) but I believe you can export to R project.

http://www.r-project.org/

Which is a real powerful statistical analysis tool.

The best thing with the last three mentioned software names is that there open source and free to use! : )

http://www.forexfactory.com/showthread.php?t=167720

The thread above was started by Mikkon who's also using SQL to interrogate a database.

Good luck, hope this helps.

James.
  • Post #5
  • Quote
  • Oct 15, 2010 6:38pm Oct 15, 2010 6:38pm
  •  FrinkFX
  • | Joined May 2009 | Status: Member | 251 Posts
I'm doing the exact same thing...
The dukascopy data seems very good but would take ages to download as the files are split by hour.

I'm using MySQL as my database but here's my SQL code:
Inserted Code
CREATE TABLE `tick`
(
`id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,
`providerId` INT UNSIGNED NOT NULL DEFAULT 0,
`pairId` INT UNSIGNED NOT NULL,
`time` DATETIME NOT NULL,
`msec` INT UNSIGNED NOT NULL DEFAULT 0, -- used with high-quality tick feeds (eg. Dukascopy)
`bid` INT UNSIGNED NOT NULL,
`ask` INT UNSIGNED NOT NULL,
`gcId` BIGINT UNSIGNED UNIQUE, -- GAIN Capital's ID, unique to each tick (used for reference)

INDEX(`pairId`,`time`,`msec`),
INDEX(`time`,`msec`)
);

I don't yet have the BidVolume and AskVolume but I'll probably add them too. I use 'providerID' to denote the data soruce: dukascopy/Gain Capital/Alpari ... etc.

Hoppe this helps
  • Post #6
  • Quote
  • Oct 16, 2010 5:50am Oct 16, 2010 5:50am
  •  pippero
  • | Joined Oct 2007 | Status: Trend following is for teens | 216 Posts
relational databases are a poor choice for timeseries data, and doubly so for high frequency tick data. I am logging 100000+ ticks on EURUSD _alone_ _per_ _day_. If you're tracking 10 pairs you're looking at a million rows per day. I doubt you can support that on mssql (or even oracle) on consumer grade hardware..
  • Post #7
  • Quote
  • Oct 16, 2010 11:21am Oct 16, 2010 11:21am
  •  neomn
  • | Joined Jan 2010 | Status: Member | 86 Posts
Thanks James for the info about the csv editor, i will experiment and break down to datetime column into day, hour, minutes column. Some of these will be foreign keys so you can query the pattern against hour, minutes, etc of each day.

will look further into knime and R. Seem like very good tool for analyze data.
Quoting redbaron1981
Disliked
Just an idea.

First of all I am very new to this kind of stuff but I'll mention a few things here that may help you along the way.

http://csved.sjfrancke.nl/

This is a really handy .csv editor if you need to do any editing on your data before importing it. Whats quite good about this is that you can add characters in to already existing data. For example forex tester data has no separators between Hours Min's Seconds, and hardly any programs recognize it like that, but with csved you can add the separators.

Secondly and I'm still...
Ignored
  • Post #8
  • Quote
  • Oct 16, 2010 11:25am Oct 16, 2010 11:25am
  •  neomn
  • | Joined Jan 2010 | Status: Member | 86 Posts
Frinks,
Bid volume and Ask volume is sometime good for momentum analysis. You can concatenate the individual tick csv into one big file.
Thanks for the schema by the way.

Quoting FrinkFX
Disliked
I don't yet have the BidVolume and AskVolum.e but I'll probably add them too. I use 'providerID' to denote the data soruce: dukascopy/Gain Capital/Alpari ... etc.

Hoppe this helps
Ignored
  • Post #9
  • Quote
  • Oct 16, 2010 12:25pm Oct 16, 2010 12:25pm
  •  7bit
  • Joined Mar 2009 | Status: Member | 1,231 Posts
Quoting pippero
Disliked
If you're tracking 10 pairs you're looking at a million rows per day. I doubt you can support that on mssql (or even oracle) on consumer grade hardware..
Ignored
Maybe smaller is better. MySQL will run on "consumer grade" hardware and is said to be amongst the fastest known databases when it comes to read only queries. I would not install something of the caliber of Oracle or a comparably huge monster database with thousands of features you won't ever need for a task like this.
  • Post #10
  • Quote
  • Oct 16, 2010 2:50pm Oct 16, 2010 2:50pm
  •  Adal
  • Joined Mar 2009 | Status: Member | 770 Posts
There are many discussion about tick storage on ET:
http://www.elitetrader.com/vb/showth...threadid=81345
http://www.elitetrader.com/vb/showth...hreadid=201252
http://www.google.com/search?q=tick%...litetrader.com

I use compressed binary files for tick data storage, since I have over 4 billion ticks - all available instrument from Dukascopy, GAIN, dbFX, FXCM and TrueFX.

I used SQLite first, then PostgreSQL, but they just don't scale to this amount of tick data (at least on high-end desktop hardware).
  • Post #11
  • Quote
  • Oct 16, 2010 3:29pm Oct 16, 2010 3:29pm
  •  pippero
  • | Joined Oct 2007 | Status: Trend following is for teens | 216 Posts
Quoting 7bit
Disliked
Maybe smaller is better. MySQL will run on "consumer grade" hardware and is said to be amongst the fastest known databases when it comes to read only queries. I would not install something of the caliber of Oracle or a comparably huge monster database with thousands of features you won't ever need for a task like this.
Ignored
Sure, as long as you dont plan on logging more than a few instruments for a few months.
  • Post #12
  • Quote
  • Oct 17, 2010 12:20pm Oct 17, 2010 12:20pm
  •  7bit
  • Joined Mar 2009 | Status: Member | 1,231 Posts
Quoting pippero
Disliked
Sure, as long as you dont plan on logging more than a few instruments for a few months.
Ignored
How can you query your huge monster CSV files faster than a DB server could do it with its highly optimized storage and query engine?
  • Post #13
  • Quote
  • Oct 17, 2010 2:09pm Oct 17, 2010 2:09pm
  •  redbaron1981
  • Joined Jul 2009 | Status: Member | 329 Posts
Hey Guys,

In my above post I mentioned my idea of breaking the data down with a data miner to separate the time groups I was looking for. This is actually quite an easy task and I can export the results to .csv or R for analysis.

I want to finish with a database where I can extract information such as average hourly ranges, when statistically that the High or low of the day was made etc etc a little bit like these charts on Mikkoms thread.

http://www.forexfactory.com/showthre...17#post2720917

I know that I can extract the time/date data I want quite easily however my problem would be from this point trying to extract the exact values for high lows etc to be able to generate a chart.

Do you guys think that it would be better for me to use Mysql for Db interrogating rather than a data miner and would it then be easier for me to extract what was needed rather than using a script to extract it?

Your advice would be much appreciated.
  • Post #14
  • Quote
  • Oct 18, 2010 7:59am Oct 18, 2010 7:59am
  •  FrinkFX
  • | Joined May 2009 | Status: Member | 251 Posts
Quoting neomn
Disliked
Frinks,
Bid volume and Ask volume is sometime good for momentum analysis. You can concatenate the individual tick csv into one big file.
Thanks for the schema by the way.
Ignored
No problem. The reason I have the bid/ask as unsigned integer BTW is because my system avoids floating-point arithmetic as much as possible and does every thing in pips and pennies. This provides a huge boost (I hope)

To doubt I'll include bid volume and ask volume once I've done some work on the system and got some sort of stable version of it up-and-running.
  • Post #15
  • Quote
  • Nov 3, 2010 5:12pm Nov 3, 2010 5:12pm
  •  letsarb
  • | Joined Apr 2010 | Status: Member | 26 Posts
To visit this website could be a good idea. It works perfectly.

http://eareview.net/tick-data

Cheers...
  • Post #16
  • Quote
  • Nov 7, 2010 5:17am Nov 7, 2010 5:17am
  •  pippero
  • | Joined Oct 2007 | Status: Trend following is for teens | 216 Posts
Quoting 7bit
Disliked
How can you query your huge monster CSV files faster than a DB server could do it with its highly optimized storage and query engine?
Ignored
Where do you think a DB stores its data, all in RAM? No, it's on disk like a csv. A simple file and directory naming scheme will go a long way.

Let me remind you that tick data is generally strictly sequential and immutable -- it's not like you need to be able to do DELETEs and UPDATEs on tick data that you logged yesterday. and when you need to do a SELECT, you usually need to select a bunch of "records" that are in sequential order.

the only thing faster than a flat file is a binary flat file.. Don't use relational dbs when there are no relations to keep track of.
  • Post #17
  • Quote
  • Nov 7, 2010 2:10pm Nov 7, 2010 2:10pm
  •  pippero
  • | Joined Oct 2007 | Status: Trend following is for teens | 216 Posts
but dont take my word for it -- try theirs: http://www.puppetmastertrading.com/b...rt-2/#more-280

some more appropriate dbs for columnar tick data:

http://www.infobright.org/ $$$
http://monetdb.cwi.nl/ opensource
http://kx.com/ $$$

and file formats:

http://en.wikipedia.org/wiki/Hierarchical_Data_Format
  • Post #18
  • Quote
  • Feb 16, 2011 11:52pm Feb 16, 2011 11:52pm
  •  noyou
  • | Joined Feb 2010 | Status: Member | 5 Posts
Quoting pippero
Disliked
but dont take my word for it -- try theirs: http://www.puppetmastertrading.com/b...rt-2/#more-280

some more appropriate dbs for columnar tick data:

http://www.infobright.org/ $$$
http://monetdb.cwi.nl/ opensource
http://kx.com/ $$$

and file formats:

http://en.wikipedia.org/wiki/Hierarchical_Data_Format
Ignored

Hi, Pippero,

Is there a step-by-step instruction to build those HDF5 files from ground up?

Thank you in advance!
  • Post #19
  • Quote
  • Last Post: Feb 17, 2011 12:09am Feb 17, 2011 12:09am
  •  Kenz987
  • Joined Aug 2006 | Status: Member | 737 Posts
I keep all my data in Ninja Trader ver 6 database, which is MS Access. (Ver 7 has been upgraded to SQL). It's not a huge amount. NT has a good built-in Strategy Tester.
  • Platform Tech
  • /
  • best SQL schema for tick data
  • Reply to Thread
0 traders viewing now
Top of Page
  • Facebook
  • Twitter
About FF
  • Mission
  • Products
  • User Guide
  • Media Kit
  • Blog
  • Contact
FF Products
  • Forums
  • Trades
  • Calendar
  • News
  • Market
  • Brokers
  • Trade Explorer
FF Website
  • Homepage
  • Search
  • Members
  • Report a Bug
Follow FF
  • Facebook
  • Twitter

FF Sister Sites:

  • Metals Mine
  • Energy EXCH
  • Crypto Craft

Forex Factory® is a brand of Fair Economy, Inc.

Terms of Service / ©2021