Hi askmike,
A friend of mine recently pointed me to this excellent post and I was immediately struck by how similar it was to my own experience! So I though i'd share a little about my own journey down this particular triangular shaped rabbit hole...
I only really started looking into crypto trading ideas in mid-Jan this year after a different friend has asked me to help out with some exchange connectivity for another project he was working on. I'm a developer by trade in the financial world and not having had an experience in crytpo-land I was keen to give it a whirl - always fun to learn new skills!
Being a naturally cautious person I was drawn toward arbitrage strategies instead of the usual alpha generation approaches, what can be better than making money risk free right? After extensive in-depth research (ok, I spent about 20 mins googling it one evening) it was pretty clear the bulk of arb going on in the crypto space (or at least what people were prepared to talking about publicly) was all about exploiting price differentials between exchanges for common pairs like BTC/USD. This didn't seem like a very good approach to me for all the same reasons you described.
As well as this my working assumption was that all of the exchanges were crooks and would use every trick in the book (front running, spoofing etc) to ensure that these simple and common approaches would not be profitable. Given that I decided to try for a less common arb strategy and so I decided to give triangular / three-way arb a try.
I decided early on that I wasn't going to get into the ultra low latency / co-location game as I was confident that wouldn't be a battle I could not win given the exchanges can (and likely do) run such strategies themselves directly with far lower latency access and without fees. Given this, and for simplicity, I stated hacking out some code in Python. As it turned out with some smart data structure choices (i.e. maintaining a realtime orderbook in O(1) for most cases) and by running my code 'near co-lo' (i.e. on the same AWS/DO datacenter as the exchange) I was able to achieve decent processing times and low-enough latency. Ultimately the biggest challenges I faced were not related to performance - but I'll come on to those shortly.
I initially coded up exchange APIs for Poloniex, GDax and HitBTC. This brought the first significant challenge to light; every exchange uses bespoke and often bewildering conventions, many are poorly documented and I found more than a few bugs.
For example Tether is confusingly called 'USD' on HitBTC rather than 'USDT' as used by most other exchanges. There is no standard (or even de-facto common standard) convention for ticker pair naming (is it BTC-USD or USD_BTC or BTCUSD or "345"?). I especially like the exchanges who name pairs without a delimiter when they list currencies whose codes are variable length! (BTCSTEEM anyone?).
The solution is obvious - if tedious - develop a common standard for the bot to use and adjust all the calls in and out of the exchanges to account for their individual conventions and quirks. In my opinion a crypto exchange API standard is badly needed given the number of exchanges out there - and no I don't mean use FIX!
Even better is Poloniex which, as you will have seen, don't use their own published websocket API (WAMP protocol) but instead use an undocumented API which you have to reverse engineer with the help of forum posts scattered over the internet - anyone would think they didn't want 3rd party bots connecting to their service
Then there were the bugs. I'm genuinely surprised how many bugs I encountered in the exchanges given how widely they are used. One of the bugs I found on GDax is potentially very serious. I filed bug reports like a good boy and they were of course duly ignored. Luckily I was able to workaround them mostly.
The next interesting challenge was how to accurately measure latency to the exchange as I would need to bot to stop executing if the latency grew too large for any reason (i.e. network or OS issues or perhaps simply because my bot Python code was to slow). Specifically I wanted to know how long it takes for a message to get from the exchange to my bot for processing.
Measuring the 'ping' time is obviously a useful starting point but I really wanted to understand how 'old' a given message is when I processed it given the various buffers and queues it will go through both within my code and OS network stack and outside on the network and even on the exchange itself. Luckily HitBTC provide a fairly accurate 'timestamp' on the ticker messages which indicates when the message was sent and so I was able to measure how old each message was when I processed it. Surprisingly (or perhaps not) not every exchange has a way to do this and in fact Poloniex does not tat I could see (at the time I wrote it at least, I haven't looked recently). I would be interested to knowhow you approached this problem?
Eventually I'd developed all this into a fairly solid async-only exchange-agnostic API (operations not available via websocket are run as async GET/POST on a thread to keep everything event driven) which could be plugged into any bot 'strategy' such that the strategy code didn't have to care about any of these exchange specific things and could focus on the algorithm itself. All fairly standard engineering type stuff. I may open source this pat of the code at some point as it is genuinely quite useful.
With that in place I (along with my friend who had dragged me into this world!) then wrote a very simple python bot which would do the obvious thing, listen to ticker bid/ask prices (not orderbook updates initially, I switched to that later when the 'liquidity' penny dropped - more on that later) over the websocket and then do the math and execute (simulated orders) the profitable ones after accounting for fees & expected slippage.
One of the great thing about the crypto world is that there are so many currency pairs that the number of three-way arb possibilities is huge! Rather than targeting a specific three-way triangle (say, USDT->BTC->ETH->USDT) I coded it to check every possible three-way permutation on every market tick! This took a little optimising to keep the cost of calculation down as there could be ~2000+ permutations on some exchanges which support a large number of currencies.
This then brought up the next challenge; getting the maths right! It sounds easy in theory but I found it surprisingly difficult in practice. Why? One explanation (other than me being stupid and tired - this was an evening and weekend pet project) is that in crypto land we're always dealing with tiny fractional numbers (i.e 0.000001 of currency X is worth 0.0000001 of currency Y) and so it is very hard to get an intuitive feel for when something is right and wrong. This is especially hard when you're trying to automatically deal with hundreds of currencies and thousands of three-ways!
Another complexity here is that to complete a three-way opportunity you typically have to buy two ticker pairs and sell a third so their is some further mental gymnastics involved. For example to execute USDT->BTC->NXT->USDT you would have to first sell USDT to get BTC but the actual exchange ticker is BTC_USDT (BTC is base currency, USDT is quote currency) and so you would instead have to buy the BTC_USDT ticker. You'd then (in parallel as you note) go on to buy NXT_BTC and sell NXT_USDT to complete the roundtrip.
This stuff does make you question your sanity at times!
After running simulations for a while against a few exchanges I soon zoned in on HitBTC as it seemed to have the magic combination of low taker fees and a high frequency of profitable three-way arb opportunities, largely I think because of the huge universe of tickers it allows you to trade.
I found that Poloniex was throwing up the occasional arb opportunity but most lasted for less than 100-200ms and it was clear that someone else was running the same strategy as me at the same time (so thanks for that!). Interestingly on both Poloniex and HitBTC I found that for very small opportunities (i.e. <0.1% profit after fees) the arb window lasted much longer suggesting that my competition were going after the bigger moves and so there was some scope for 'bottom feeding'.
Then came the first 'oh s**t' moment.
Until now I'd be looking for arb opportunities and simulating based on a fixed notional amount at the current best bid/offer prices. So for example I'd start with 100 USDT and simulate a three-way arb via two other currencies and end up back at USDT. Do I end up with more than I started with after fees? If so - great! We have a winner.
The (now gapingly obvious to me) flaw here is of course that there may not be sufficient liquidity at the best bid/offer price to execute all 3 pairs for the notional amount. To solve this I moved away from trying to roundtrip a fixed notional amount and instead worked out the 'highest common liquidity' between the 3 ticker pairs involved in the three-way transaction.
To do this I had to move beyond the simple feed of ticker prices (which contain only the best bid/ask, not the quantities) and instead consume the full orderbook so as to have access to the quantity of orders available at the best bid & ask. This took a little work as I wanted to ensure I could maintain the orderbook efficiently and so avoided simple ordered lists or binary trees and the like. Ultimately a simple solution would have been fine given ultra-low latency was never the goal, but still...
What I found is that after doing this the arb profitability reduced significantly as it turns out that for many of the currencies with regularly appeared in profitable three-ways has very poor liquidity on the orderbook and so this limited the 'highest common liquidity' I could transact across the three-way. Typically I found the executing an amount of around ~0.01 BTC was possible for two of the three pairs only.
Then came 'oh s**t' moment #2.
When simulating I could quite happily buy and sell tiny fractions of a coin such that the amount I was buying or selling in each pair was roughly equal. For example if the 'highest common liquidity' between the chosen three currencies was say $0.001 USD then I would buy / sell this much of the tickers in the three-way.
However in the real world the exchanges have minimum order quantities and minimum order quantity increments (sometimes published, sometimes not and discovered by trial and error). After some analysis it turned out the on HitBTC the minimum order sizes, in $USD terms vary dramatically between pairs (magnitude different). Most notably the minimum order size and minimum increment for BTC based pairs is 0.01 BTC which is a coarse grained number. Essentially it means if the 'highest common liquidity' was 0.015 BTC then my choice is either to trade 0.01 BTC or 0.02 BTC.
The upshot of that is that it makes performing genuine three-way arb essentially impossible as the amount we trade in each pair is significantly lop-sided. I think this is an artifact of history; when HitBTC first set the minimum quantities they were likely all roughly inline but over time as prices have moved dramatically and so they have drafted apart and have not been reset.
This is the point where I decided to pause and moved on to other more tantalising strategies rather than continuing with this strategy on a different exchange. It has served as a great learning opportunity and given me the foundation I needed to go on to work on some more interesting things - i'll leave that for another day!
Thanks again for posting this, a great read and also nice to hear of other peoples experiences of coding trading bots in this magic-internet-money world.
Good luck with your index arb bot
A friend of mine recently pointed me to this excellent post and I was immediately struck by how similar it was to my own experience! So I though i'd share a little about my own journey down this particular triangular shaped rabbit hole...
I only really started looking into crypto trading ideas in mid-Jan this year after a different friend has asked me to help out with some exchange connectivity for another project he was working on. I'm a developer by trade in the financial world and not having had an experience in crytpo-land I was keen to give it a whirl - always fun to learn new skills!
Being a naturally cautious person I was drawn toward arbitrage strategies instead of the usual alpha generation approaches, what can be better than making money risk free right? After extensive in-depth research (ok, I spent about 20 mins googling it one evening) it was pretty clear the bulk of arb going on in the crypto space (or at least what people were prepared to talking about publicly) was all about exploiting price differentials between exchanges for common pairs like BTC/USD. This didn't seem like a very good approach to me for all the same reasons you described.
As well as this my working assumption was that all of the exchanges were crooks and would use every trick in the book (front running, spoofing etc) to ensure that these simple and common approaches would not be profitable. Given that I decided to try for a less common arb strategy and so I decided to give triangular / three-way arb a try.
I decided early on that I wasn't going to get into the ultra low latency / co-location game as I was confident that wouldn't be a battle I could not win given the exchanges can (and likely do) run such strategies themselves directly with far lower latency access and without fees. Given this, and for simplicity, I stated hacking out some code in Python. As it turned out with some smart data structure choices (i.e. maintaining a realtime orderbook in O(1) for most cases) and by running my code 'near co-lo' (i.e. on the same AWS/DO datacenter as the exchange) I was able to achieve decent processing times and low-enough latency. Ultimately the biggest challenges I faced were not related to performance - but I'll come on to those shortly.
I initially coded up exchange APIs for Poloniex, GDax and HitBTC. This brought the first significant challenge to light; every exchange uses bespoke and often bewildering conventions, many are poorly documented and I found more than a few bugs.
For example Tether is confusingly called 'USD' on HitBTC rather than 'USDT' as used by most other exchanges. There is no standard (or even de-facto common standard) convention for ticker pair naming (is it BTC-USD or USD_BTC or BTCUSD or "345"?). I especially like the exchanges who name pairs without a delimiter when they list currencies whose codes are variable length! (BTCSTEEM anyone?).
The solution is obvious - if tedious - develop a common standard for the bot to use and adjust all the calls in and out of the exchanges to account for their individual conventions and quirks. In my opinion a crypto exchange API standard is badly needed given the number of exchanges out there - and no I don't mean use FIX!
Even better is Poloniex which, as you will have seen, don't use their own published websocket API (WAMP protocol) but instead use an undocumented API which you have to reverse engineer with the help of forum posts scattered over the internet - anyone would think they didn't want 3rd party bots connecting to their service
Then there were the bugs. I'm genuinely surprised how many bugs I encountered in the exchanges given how widely they are used. One of the bugs I found on GDax is potentially very serious. I filed bug reports like a good boy and they were of course duly ignored. Luckily I was able to workaround them mostly.
The next interesting challenge was how to accurately measure latency to the exchange as I would need to bot to stop executing if the latency grew too large for any reason (i.e. network or OS issues or perhaps simply because my bot Python code was to slow). Specifically I wanted to know how long it takes for a message to get from the exchange to my bot for processing.
Measuring the 'ping' time is obviously a useful starting point but I really wanted to understand how 'old' a given message is when I processed it given the various buffers and queues it will go through both within my code and OS network stack and outside on the network and even on the exchange itself. Luckily HitBTC provide a fairly accurate 'timestamp' on the ticker messages which indicates when the message was sent and so I was able to measure how old each message was when I processed it. Surprisingly (or perhaps not) not every exchange has a way to do this and in fact Poloniex does not tat I could see (at the time I wrote it at least, I haven't looked recently). I would be interested to knowhow you approached this problem?
Eventually I'd developed all this into a fairly solid async-only exchange-agnostic API (operations not available via websocket are run as async GET/POST on a thread to keep everything event driven) which could be plugged into any bot 'strategy' such that the strategy code didn't have to care about any of these exchange specific things and could focus on the algorithm itself. All fairly standard engineering type stuff. I may open source this pat of the code at some point as it is genuinely quite useful.
With that in place I (along with my friend who had dragged me into this world!) then wrote a very simple python bot which would do the obvious thing, listen to ticker bid/ask prices (not orderbook updates initially, I switched to that later when the 'liquidity' penny dropped - more on that later) over the websocket and then do the math and execute (simulated orders) the profitable ones after accounting for fees & expected slippage.
One of the great thing about the crypto world is that there are so many currency pairs that the number of three-way arb possibilities is huge! Rather than targeting a specific three-way triangle (say, USDT->BTC->ETH->USDT) I coded it to check every possible three-way permutation on every market tick! This took a little optimising to keep the cost of calculation down as there could be ~2000+ permutations on some exchanges which support a large number of currencies.
This then brought up the next challenge; getting the maths right! It sounds easy in theory but I found it surprisingly difficult in practice. Why? One explanation (other than me being stupid and tired - this was an evening and weekend pet project) is that in crypto land we're always dealing with tiny fractional numbers (i.e 0.000001 of currency X is worth 0.0000001 of currency Y) and so it is very hard to get an intuitive feel for when something is right and wrong. This is especially hard when you're trying to automatically deal with hundreds of currencies and thousands of three-ways!
Another complexity here is that to complete a three-way opportunity you typically have to buy two ticker pairs and sell a third so their is some further mental gymnastics involved. For example to execute USDT->BTC->NXT->USDT you would have to first sell USDT to get BTC but the actual exchange ticker is BTC_USDT (BTC is base currency, USDT is quote currency) and so you would instead have to buy the BTC_USDT ticker. You'd then (in parallel as you note) go on to buy NXT_BTC and sell NXT_USDT to complete the roundtrip.
This stuff does make you question your sanity at times!
After running simulations for a while against a few exchanges I soon zoned in on HitBTC as it seemed to have the magic combination of low taker fees and a high frequency of profitable three-way arb opportunities, largely I think because of the huge universe of tickers it allows you to trade.
I found that Poloniex was throwing up the occasional arb opportunity but most lasted for less than 100-200ms and it was clear that someone else was running the same strategy as me at the same time (so thanks for that!). Interestingly on both Poloniex and HitBTC I found that for very small opportunities (i.e. <0.1% profit after fees) the arb window lasted much longer suggesting that my competition were going after the bigger moves and so there was some scope for 'bottom feeding'.
Then came the first 'oh s**t' moment.
Until now I'd be looking for arb opportunities and simulating based on a fixed notional amount at the current best bid/offer prices. So for example I'd start with 100 USDT and simulate a three-way arb via two other currencies and end up back at USDT. Do I end up with more than I started with after fees? If so - great! We have a winner.
The (now gapingly obvious to me) flaw here is of course that there may not be sufficient liquidity at the best bid/offer price to execute all 3 pairs for the notional amount. To solve this I moved away from trying to roundtrip a fixed notional amount and instead worked out the 'highest common liquidity' between the 3 ticker pairs involved in the three-way transaction.
To do this I had to move beyond the simple feed of ticker prices (which contain only the best bid/ask, not the quantities) and instead consume the full orderbook so as to have access to the quantity of orders available at the best bid & ask. This took a little work as I wanted to ensure I could maintain the orderbook efficiently and so avoided simple ordered lists or binary trees and the like. Ultimately a simple solution would have been fine given ultra-low latency was never the goal, but still...
What I found is that after doing this the arb profitability reduced significantly as it turns out that for many of the currencies with regularly appeared in profitable three-ways has very poor liquidity on the orderbook and so this limited the 'highest common liquidity' I could transact across the three-way. Typically I found the executing an amount of around ~0.01 BTC was possible for two of the three pairs only.
Then came 'oh s**t' moment #2.
When simulating I could quite happily buy and sell tiny fractions of a coin such that the amount I was buying or selling in each pair was roughly equal. For example if the 'highest common liquidity' between the chosen three currencies was say $0.001 USD then I would buy / sell this much of the tickers in the three-way.
However in the real world the exchanges have minimum order quantities and minimum order quantity increments (sometimes published, sometimes not and discovered by trial and error). After some analysis it turned out the on HitBTC the minimum order sizes, in $USD terms vary dramatically between pairs (magnitude different). Most notably the minimum order size and minimum increment for BTC based pairs is 0.01 BTC which is a coarse grained number. Essentially it means if the 'highest common liquidity' was 0.015 BTC then my choice is either to trade 0.01 BTC or 0.02 BTC.
The upshot of that is that it makes performing genuine three-way arb essentially impossible as the amount we trade in each pair is significantly lop-sided. I think this is an artifact of history; when HitBTC first set the minimum quantities they were likely all roughly inline but over time as prices have moved dramatically and so they have drafted apart and have not been reset.
This is the point where I decided to pause and moved on to other more tantalising strategies rather than continuing with this strategy on a different exchange. It has served as a great learning opportunity and given me the foundation I needed to go on to work on some more interesting things - i'll leave that for another day!
Quote:But the really interesting part of this system is that having less fees means you are able to do more opportunities. Since opportunities that are not profitable with 0.25% might very well be with a 0.20% fee. This difference might not sound like a lot, but it is huge.I couldn't agree more! Fees are a huge limiting factor for any market-taker based strategy. I didn't ever execute enough volume to move down the fee table but it's clear that is the way you have to go to unearth the profitable opportunities. Whoever said size doesn't matter was wrong
Quote:The first trick is not breaking any rule, but it leads up to one of the main tricks up Agent Smith's sleeve.I also came across this issue on Poloniex but, as I was largely targeting HitBTC at that point, didn't dig into it much so thanks for saving me the effort!
> Poloniex error 422: Nonce must be greater than X. You provided Y.
Thanks again for posting this, a great read and also nice to hear of other peoples experiences of coding trading bots in this magic-internet-money world.
Good luck with your index arb bot