[gekko 0.6] API change for running a backtest
#1
I am currently doing a major rewrite of the event system within the codebase. There are a few reasons for doing so: it allows for new features like advanced order types, a faster backtester, better event streamline and a lot more. I currently have an early version of a new backtester, you can find it in the PR for 0.6:

https://github.com/askmike/gekko/pull/1850

Since this commit the API endpoint has changed for doing a backtest. Instead of passing a data object that defines what data you want you now need to add a new plugin to the config called the "backtestResultExporter", in here you can specify what data you want to be returned. See here what the UI is passing. You can also specify indicator results! Native indicator results are available, async ones are coming asap.

On top of that, this is what I am planning to do:

- We can actually optimize a lot of things if we know the backtest doesn't need to return any candles / indicator numbers / trades, not just on at the end before we send out the result over the REST API, but even inside the gekko stream.
- We can drastically increase performance of async indicators by pregenerating all their outputs in batches. This means we don't have to ask TA-lib every candle, but just once every X candles (as much as we can store in memory).

If anyone has more ideas please throw them here since I am changing it right now anyway Smile
  Reply
#2
But isn't this sort of the same as now?

Code:
    'data' => [
                'candleProps' => 0, // 0 = disable, else ['close','start']
                'indicatorResults' => 0, // 0 or 1 (does nothing?)
                'report' => 1,
                'roundtrips' => 1, // set to 1 to get all roundtrips
                'trader' => 0, // does nothing?
            ],

I don't really see the difference except that we got a new array with some new keys?
As it is now 'we' simply post data to the my-server.com:3000/api/ with a gekkoConfig object, what do you mean will change from a more pratical standpoint except there being new keys that want new values?

Would it be possible for you to post a JSON sample of the new configuration object?
  Reply
#3
There isn't that much difference, you just need to format the JSON you post differently (for now). I'm not ruling out that this is all that will change though. This new interface allows me to do a ton of more optimizations in the future.

This is what the UI is posting now:

Code:
{
  "watch": {
    "exchange": "binance",
    "currency": "USDT",
    "asset": "BTC"
  },
  "paperTrader": {
    "feeMaker": 0.25,
    "feeTaker": 0.25,
    "feeUsing": "maker",
    "slippage": 0.05,
    "simulationBalance": {
      "asset": 1,
      "currency": 100
    },
    "reportRoundtrips": true,
    "enabled": true
  },
  "tradingAdvisor": {
    "enabled": true,
    "method": "MACD",
    "candleSize": 60,
    "historySize": 10
  },
  "MACD": {
    "short": 10,
    "long": 21,
    "signal": 9,
    "thresholds": {
      "down": -0.025,
      "up": 0.025,
      "persistence": 1
    }
  },
  "backtest": {
    "daterange": {
      "from": "2017-11-14T03:18:00Z",
      "to": "2017-12-27T05:39:00Z"
    }
  },
  "performanceAnalyzer": {
    "riskFreeReturn": 2,
    "enabled": true
  },
  "backtestResultExporter": {
    "enabled": true,
    "writeToDisk": false,
    "data": {
      "stratUpdates": false,
      "roundtrips": true,
      "stratCandles": true,
      "stratCandleProps": [
        "close"
      ],
      "trades": true
    }
  }
}

The following things changed:

- You now need to post config, don't wrap it in a gekkoConfig key anymore.
- What you used to put in data has now moved inside the config into a plugin called "backtestResultExporter".
- This new plugin is basically the previous data key, but some things moved around.
- Output and naming conventions are now exactly the same inside gekko as outside gekko, you can find all events and their structure here.
- You can now get the indicator values (async coming soon), see the stratUpdate event for details.
- This plugin can write the same data it posts to disk for later analysis.

So yes, as you said changes are minimal for now.
  Reply
#4
So basically the format is now more sane. Smile

I'll keep an eye on this, thanks for the quick updates.
  Reply
#5
Another thing I am thinking about: When you run a backtest this is basically the work that needs to be done:

1. load 1min candles from disk.
2. convert them into 1h candles (assuming candle size is 60).
3. calculate indicators
4. run the strategy
5. simulate trades
6. calculate performance

I have a feeling the first 2 items take up a lot of time. If you build a GA tool that tries to figure out the best MACD params (example) there is little reason to do step 1 and 2 on every iteration (if you have enough memory to store them). If you want to test strategies with different tresholds (keeping all indicator params intact) we can even include step 3 here. What about some new endpoints like this:

Code:
POST prepareBacktestData { watch, candleSize, etc. } -> returns some ID
POST backtest { the id you just got } -> returns backtest result

After the first call Gekko has stored all of these candles and keeps them in memory to very quickly feed them into a backtest. This might be able to speed up the backtests by a ton since a lot of heavy lifting will just be done once.
  Reply
#6
This is really interesting Mike.

The prepare data idea is a really good one.

I wonder how this could work with multi time candles given that is where many of us are going. It's hard to know without there being a settled solution to this of course!
  Reply
#7
(03-27-2018, 11:52 AM)askmike Wrote: Another thing I am thinking about: When you run a backtest this is basically the work that needs to be done:

1. load 1min candles from disk.
2. convert them into 1h candles (assuming candle size is 60).
3. calculate indicators
4. run the strategy
5. simulate trades
6. calculate performance

I have a feeling the first 2 items take up a lot of time. If you build a GA tool that tries to figure out the best MACD params (example) there is little reason to do step 1 and 2 on every iteration (if you have enough memory to store them). If you want to test strategies with different tresholds (keeping all indicator params intact) we can even include step 3 here. What about some new endpoints like this:

Code:
POST prepareBacktestData { watch, candleSize, etc. } -> returns some ID
POST backtest { the id you just got } -> returns backtest result

After the first call Gekko has stored all of these candles and keeps them in memory to very quickly feed them into a backtest. This might be able to speed up the backtests by a ton since a lot of heavy lifting will just be done once.

I have the feeling that loading all 1 min candles when running e.g. 3 minutes (or 5.. or 10) is slighly degrading to performance? Why not actually load the candles that are needed straight away; no need to convert if SQL since the query could simply be done in one single query? This combined query is ofc slower then a "SELECT *" but step 2 isn't needed then (so potentially faster) and this whole query could easily be cached temporarily somewhere:

Code:
SELECT * FROM candles_BTC_USD
WHERE ID % 60 = 0

Returns each 60th candle which in this case would be an analog for using a candle size of 60 minutes.

Also the idea that it takes a long time to do the query that requests the 1 min candles is, sorry to say, slightly false? On average a SELECT * FROM candles_xxx_xxx that contain a total of 2 years and 11 months of data only take 243-438ms (0.2-0.4s). Do note that e.g. SQLite actually caches queries so a properly setup environment for it should at least cut that time in half. Do also note that measurement was not taken using Gekko but another external tool so if Gekko has poor optimizations or uses a poor library for SQLite that may affect things.
  Reply
#8
(03-30-2018, 10:16 AM)thegamecat Wrote: This is really interesting Mike.

The prepare data idea is a really good one.

I wonder how this could work with multi time candles given that is where many of us are going. It's hard to know without there being a settled solution to this of course!

This wouldn't be a problem if the logic goes like this:

let data = getCache(id); // returns false if no cache.....
if( !data ) data = getData(id); // no cache, do request

So any request would just get cached and it wouldn't matter how the request was made.
Basically the same as now except just a check on top.
  Reply
#9
I think that's exactly why this needs to consider multitime from the outset - you assume it's easy because there will be a flag. That is a consideration straight out of the block!

For the record I like your proposal.
  Reply
#10
Regarding db and speed...I'm generally of the belief that while sqlite is great for using out of the box Gekko, postgres is the way forward generally - and some will swear by Mongo. So any speed benefits in this context should be:

1) Gekko core
2) Some community guides on how to increase performance on the db itself - maybe with some tweaks to the db plugins.
  Reply


Forum Jump:


Users browsing this thread: