Faster imports via shared CDN service

tommiehansen · 02-13-2018, 10:28 AM

The current way of doing imports is less ideal. For very small datasets it isn't a problem, but 1+ years worth of data
quickly becomes somewhat of a problem.

So...

1. Route all 'imports' via a cache (a cdn or other server/service such as github)
2. If server has data available for date X get that period from the cache (else get from exchange api)
3. Have option to upload the data on completion (thus making the data available for others)
--
4. Don't store the data as pure db's but rather store it as e.g. csv files so that these can get gzipped.
5. After import convert the gzipped files to db's.

So the problem...
Most exchanges got a TOS that prevent data from being shared or make it easily available.
Thus the data would need to be obfuscated so that it only works (easily) via Gekko.

susitronix · 02-13-2018, 06:49 PM

This is very intresting and yes it takes a lot of time. (with Raspi it took even longer than win7)

Unforgently i cant understand the subject yet

SirTificate · 02-13-2018, 10:19 PM

I wonder if a (public *cough*) reverse proxy cache is against the TOS. The setup should be quite easy with nginx or varnish. The reverse proxy would either serve the requested data from a local cache or from the corresponding exchange in the background. The response times could be reduced dramatically for many users.
Gekko would just have to call an alternative url for certain API Endpoints and remove any API keys from the requests. You don't want anyone sitting in the middle and sniffing API credential.s

tommiehansen · (This post was last modified: 02-14-2018, 12:20 AM by tommiehansen.)

It seems many sites simply use the data from the exchanges straight off, there are several such as the larger Cryptocompare.com that gets (and shares) all the data from the exchanges.

***askmike*** · 02-19-2018, 05:14 AM

> 4. Don't store the data as pure db's but rather store it as e.g. csv files so that these can get gzipped.

You probably want to store it as binary json as opposed to utf8 (csv, json), that gzips even better in my experience.

> Most exchanges got a TOS that prevent data from being shared or make it easily available.

Are you sure? I haven't heard this before, there are even some online services that are selling raw data for $$.

------

I am already storing a lot of different markets and as part of my Gekko service will make this available. I hope to make it available to everyone for free (I can maybe use cloudflare), but I'm not sure how many people are looking for so much data.

susitronix · 02-20-2018, 02:13 AM

I am intrested for huge data sets. thanks

tommiehansen · 02-21-2018, 10:05 AM

(02-19-2018, 05:14 AM)askmike Wrote: > 4. Don't store the data as pure db's but rather store it as e.g. csv files so that these can get gzipped.

You probably want to store it as binary json as opposed to utf8 (csv, json), that gzips even better in my experience.

> Most exchanges got a TOS that prevent data from being shared or make it easily available.

Are you sure? I haven't heard this before, there are even some online services that are selling raw data for $$.

------

I am already storing a lot of different markets and as part of my Gekko service will make this available. I hope to make it available to everyone for free (I can maybe use cloudflare), but I'm not sure how many people are looking for so much data.

Thank you for taking the time to answer, I know you're occupied etc so great.

Yes, bson would be even better. The storing method is the simplest part though. Smile

If one go to the exchanges they all got TOS's, but it sort of depend on what defines 'shares'. A format that is very specific for something and not RESOLD shouldn't be a problem. As said -- the data is already used by several other sites so doesn't seem to be a problem. This was merely a heads up.

--

Backtesting requires as much data as possible. This for obvious reasons: One want as many different sort of periods as possible. This since more data reaffirms something to a much greater extent then less data.

This is true for all backtesting. That some seem to believe that a 3m period is enough to confirm that a strategy works is quite insane tbh. 'Anyone' can easily create a strategy that will only work for such a short period.

tommiehansen · 03-28-2018, 07:55 AM

Any updates on this?

Problem is say we got some data that simply got screwed up. The easy solution then is to simply delete that data and import again but problem -- say that data contains candles for over 1 year... That's an import that takes so much time that the operation of deleting + importing again becomes very tedious. It's also a problem since the imports easily die and thus creates new databases with bad data and the cycle continues.

An idea for the data (it can become gigantic sets...) is to use a torrent network and then use a node torrent client for the clients to connect to that network.

xFFFFF · (This post was last modified: 04-05-2018, 03:35 PM by xFFFFF.)

In my opinion best way is something like phpmyadmin on public server where everybody can choose periods, pairs, exchanges and download defined candles by export to sqlite, postgre or mongodb.

It can be do propably in raw mysql and frontend like phpmyadmin.
Or by simpler script which for example: download from server specified parts with one pair, with 7 days period and merge all pairs to specified by user datasets on local.

All datasets size from all supported exchanged should has some GB. Not more like 50 GB. It should be done with webserver.

Login




Remember me Lost Password?

Disclaimer