[ARBITRAGE] Agent Smith: Arbing the great bull run of 2017
#1
This is a story about curiosity and exploration. And how after countless evenings of debugging, tinkering and making mistakes (read: losing money) I ended up with a small piece of code that was moving serious money and pocketing a bit as well. Let me lead you down the rabbit hole of my creation: Agent Smith.

[for the impatient, scroll down to "The Climax".]

[Image: 38663056-f7e6b2b4-3e5f-11e8-85a6-abf0ed445c93.jpg]

It all started on a rainy night in May of 2017 (I was living in Galway, Ireland and it rains an awful lot in the west coast). Playing more and more with the cryptocurrency exchange Poloniex I started to worry about the execution price of doing trades on some markets because of illiquid orderbooks:

When you have some Bitcoin and you want to trade them for Ether (two different cryptocurrencies) you can do that on Poloniex directly on their BTC/ETH market. If you want to trade instantly you will need to buy the Ether from someone who is currently offering to sell it (called an "ask offer"), the only alternative is creating your own offer to buy (a "bid offer") hoping that someone will sell them to you. Usually taking an ask offer is fine as there are a number of people currently willing to buy and sell (all publically visible in something called the "orderbook"). But in some scenarios of heavy price movements (like during a crash) I noticed that the prices being offered were terrible, there are moments when the cheapest way to buy Ether for Bitcoin on Poloniex is not buying directly on the ETH/BTC market but by taking a stranger path: Sell your Bitcoin into USDT and use those to buy the Ether. This is very unintuitive since every time you do a trade you have to pay fees. Doing two trades means twice the amount of fees compared to a single trade.

When working on trading systems you often come across situations where the markets don't always behave in your advantage when you are trying to do something specific. Sometimes you can measure and (statistically) predict these situations. Even more rarely you can exploit them, those are called market inefficiencies. Taming them can be a hard and bumpy path and most don't tend to stick around for long (due to natural shifts in the markets). Whatever the case: you will have to fight tooth and nail for them since you might not be the only one knowing about it.

Enter arbitrage

Imagine a cheese market where people are buying and selling cheese all day long. If you know a certain shop selling Gouda cheese for $10 and you hear someone saying he is willing to buy a Gouda cheese for $11 you just found a way to make $1! Arbitrage is a well understood and studied practice in a lot of markets. There are huge arbitrage opportunities between Bitcoin on different exchanges:

[Image: 38663092-12a955fc-3e60-11e8-9cbe-1518ed74aded.png]

Just like the Gouda cheese, you can see that Bitcoin is trading at a different price wherever you look. Though compared to a few months ago the prices are a lot more similar (or in other words, the markets are more efficient). Doing Bitcoin arbitrage can be as simple as buying bitcoin on the exchange where it is cheap (in this picture Bittrex) and sending it over and selling it on the one where it expensive (in this picture HitBTC).

Unfortunately doing this kind of arbitrage is a tough nut to crack, in order to do this these are factors you need to take into account:

- The cost of trading (buying and selling), these are trading fees.*
- The cost of moving the money (withdrawing and depositing), different fees.
- The price at which you can actually trade: the picture above does not show the price for which you can buy or sell at a market, just what price of the last trade was. You actually have to buy from the lowest ask and sell to the highest bid (the bid is always lower than the ask, meaning this eats into your profit).
- The amount of money you actually can buy or sell based on the top ask and bid.
- The risk of having a lot of money on multiple exchanges.
- The risk of moving a lot of money to different exchanges via your bank account: usually banks don't like it when you receive thousands of dollars a day from a company in one country and you send it to a company in another country (looks an awful lot like money laundering). Though this very much depends where you are based and what kind of relationship you have with your bank.
- The risk of not being fast enough: unfortunately you are probably not the first one to see this opportunity, turning this whole thing into a race. Inefficiencies exploitable by arbitrage are inherintly zero sum: there is only a fixed amount of money to be made, and a lot of people are out to get it. This is the reason you won't find any good arbitrage bots online.
- The risk of holding bitcoin at all (called market exposure).

*Fees are not always equal, I'll go over this more in "Speed is not everything" down below.

Tri arb

However there are a lot of ways to do arbitrage, including exploiting the inefficiency I saw on Poloniex. Poloniex has a lot of different cryptocurrencies you can trade on a lot of different of markets. Going back to the BTC/ETH example: there are three different currencies linked between three markets:

[Image: 38663137-2e904dde-3e60-11e8-9b7d-3fd1fc816c17.png]

You can trade any of these three currencies for one another directly on the market. Each time you do you have two costs:

- You need to pay a fee to Poloniex, this is called a taker fee.
- In case of buying you need to go all the way down from the price people are offering to sell at to the price people are offering to buy at (this difference is called the spread).

This means that if you have some Bitcoin, you can trade them into Ether, you can trade those into USDT and finally you can convert these back into Bitcoin. Doing this is not the best deal, because you almost always lose money.

Note the almost in there, let's talk about the time you don't lose money but actually make money Wink

Equilibrium

[Image: 38663155-40e41952-3e60-11e8-8f84-db059ff81399.png]

Imagine the red numbers describing the prices of the market, for simplicity sake let's say they describe the type of order (bid or ask) we are taking in the example above (by going Bitcoin -> Ether -> USDT -> Bitcoin). In this situation you would lose money doing these three trades, since you'd lose out on fees for each trade. Let's assume your fee is 1% and you start with 1 Bitcoin:

- You trade your 1 Bitcoin into 10 Ether, after paying the fee you have 9.9 Ether
- You trade your 9.9 Ether into 990 USDT, after paying the fee you have 989.01 USDT
- You trade your 989.01 USDT into 0.98901 Bitcoin, after paying the fee you have 0.979[..] BTC

But the sharp readers among you might have noticed a clue:

Quote:But in some scenarios of heavy price movements (like during a crash) I noticed that [...]

Imagine that Bitcoin (on the BTC/USD market) just crashed down from 1000 to 900, like shown here:

[Image: 38663188-61e9ec26-3e60-11e8-9a7f-3a0d3f1d5a8c.png]

Let's see what happens this time:

- You trade your 1 Bitcoin into 10 Ether, after paying the fee you have 9.9 Ether
- You trade your 9.9 Ether into 990 USDT, after paying the fee you have 989.01 USDT
- You trade your 989.01 USDT into 1.0989 Bitcoin, after paying the fee you have 1.087911 BTC

Bam! We made almost 9% profit in a single set of trades! Even though the price never crashes this much in a blink, if you can keep doing this all day long and make a penny every time you might be able to afford a few lambos by now.

Scouting the rabbit hole

After I figured this all out I had to do something with it. So back I went went from the drawing board to the keyboard. Agent Smith was born as a simple 20 line script that connected to Poloniex and asked to receive all messages that had to do with orders being offered on these three markets. Most exchanges offer a (semi-)realtime connection using websockets: you call the exchange and tell them what information you are interested in and as soon as anything happens they will message you. And messaging Poloniex did, a few dozen times a second something changed in the offers people currently want to buy or sell at at any of those three markets.

After every update my script would simply repeat the above calculations with real prices. If the final Bitcoin amount is more than the start amount it would make a little note. Tasking my Bitcoin fullnode raspberry PI with keeping the script running while I went to bed.

The next morning (not going to lie: a few weeks later. It took a while to keep the whole thing running with stable websocket code and actual working calculations) I saw the notes from my faithful Agent Smith: There have been times when doing the full buy-sell-buy roundtrip of three trades I would actually have ended up with more money than I started with!

Getting into the race

This is where the fun started: The notes of agent smith showed me that these scenarios never lasted longer than a second. There are a few reasons for this, the easiest (and most fun) explanation was that someone else figured this out before me and was doing the trades (which causes the opportunity the cease existing). How to beat this hypothetical competing agent of the system?

The most obvious answer here is to be faster than him, faster than my new sworn arch enemy Agent Jones.

[Image: 38728425-0d0aa8e6-3f3a-11e8-8f55-b67ae53c3e08.jpg]

Back to the keyboard I went, this time replacing Agents Smith "make a note" code with code that actually created the trades on Poloniex, automatically trading the exact amounts necessary for me to end up with more bitcoin. And of Agent Smith went, sometimes making money but more often losing money:

In the example above we are buying Bitcoin for our USDT as soon as Bitcoin crashes down. If Agent Smith is unable to buy from the "ask offer" he saw a moment earlier (for 900 USDT) he will need to buy at a different price, a higher price since the 900 ask offer was already taken by Agent Jones. Agent Smith can go from making money to losing money in a fraction of a second.

Agent Smith entered the race upon two realizations:

- When you do an order at Poloniex it will take up to a second for Poloniex to process. Meaning that if you do the trade from BTC into ETH first and only after that do the second trade you will always be too slow. Instead you need to have BTC, ETH and USDT ready to make all three trades at the same time the microsecond the opportunity arises.
- The profits are so small that as soon as you are not fast enough for one trade you will almost always lose money. However dealing with this situation in the best way possible is as important as being extremely fast, since unfortunately you will not always be fast enough (or Agent Smith wasn't anyway).

Understanding the game

[Image: 38663422-eff88504-3e60-11e8-879a-f98f97138526.jpg]

I'm not sure if you read the full disclaimer of the red pill, but the rabbit hole goes all the way down the nitty gritty technical stuff. Let us not forget that agents are software programs in the end.

So how can we make Agent Smith faster than other potential agents? First we need to layout the race course so that we even know what we mean by that: The exploitable situation described above is merely a snapshot from the perspective of the observer. When Agent Smith sees a certain situation (specific prices at different markets) it is merely a snapshot (or an internal representation that combines multiple snapshots) at some point in time after the fact.

To better explain: computers and can only do one thing a the same time. And the computers operated by exchanges run software that does the actual trading based on orders submitted by people (and agents). We already know half the story: there is a list of people willing to buy and sell right now, their orders are stored in this thing called the orderbook. The fact that they are in the orderbook means that they cannot trade with each other at the price they specified:

As soon as Alice wants to buy bitcoin for $1000 her order will go in the orderbook (a bid). When Bob comes along wanting to sell his bitcoin for $1000 his order would be matched against Alice's order. If Bob wanted to sell his bitcoin for $1001 instead it would not have gotten matched, his order would have end up in the orderbook instead (an ask). The software that handles all of this is called the matching engine. And this like any other software it can only do a single thing at the same time, meaning that it takes in a list of orders and either puts them into the orderbook OR matches them with orders in the orderbook, like so:

input:

1. buy 100 @ 1000  -> into the book!
2. buy 100 @ 1001  -> into the book!
3. buy 100 @ 1001  -> into the book!
4. sell 100 @ 1001 -> match against order 2! (and remove order 2 from the book)
5. sell 100 @ 1002 -> into the book!
5. sell 100 @ 1001 ->  match against order 3! (and remove order 3 from the book)

Every time a new order goes in the orderbook will change in some way, and Agent Smith wants to be the first to know so it can check the prices so it can outpace Agent Jones if there is an opportunity. From a time perspective this is the order in which things happen:


[Image: 38663462-080b50c2-3e61-11e8-92b2-009f11638007.png]

However Poloniex needs some time to process Alice order, and during that time it will queue up other orders that come in into a queue like so:

[Image: 38663479-16539ca2-3e61-11e8-9dc1-1e4e62b87f9e.png]

Note that after Alice order is processed and Agent Smith sees the updated orderbook, the order from Bob is about to be processed. So even though the orderbook as observed after Alice's order did actually exist, there is NO way to act on it without waiting for Bob's order first (who might be trading against Alice's order). There is also no information about this queue, so whenever Agent Smith receives an orderbook update it might be a glimpse from a situation that cannot be acted on. If this queue gets to big you can see how dangerous this can get, since everyone will be submitting orders based on some orderbook state that (by the time their submitted orders are executed) is no longer true anymore.

This however is a problem everyone has to deal with (unless you do things like adding a lot of small orders to measure the throughput speed of their matching engine, which is hard to do without keeping Agent Smith 100% focused on responding as fast as possible to arbitrage opportunities).

The next part is networking: when Poloniex has an updated orderbook it needs to tell Agent Smith, this is a websocket frame that is going over the internet from Poloniex infrastructure all the way into my faithful Raspberry PI that is running Agent Smith. In order to be sure that Agent Smith sees this as fast as possible we want to run Agent Smith as close to Poloniex infrastructure as possible. But here is a problem!

Poloniex (and most other exchanges) use a service called CloudFlare that sits between Poloniex and everyone else. CloudFlare does things like protecting Poloniex from DDoS attacks and such. So let's put Agent Smith next to CloudFlare? Well CloudFlare runs servers all over the globe as part of a CDN/edge network. This allows them to cache Poloniex website all over the world and provide snappy experiences to people in Europe as well as people in South America (since they can cache a lot in both locations).

Let's go over the timeline of everything that is happening:

[Image: 38663519-35718842-3e61-11e8-8311-b5ce6f63c6c6.png]

As you can see, there are two steps where something needs to go through CloudFlare and the internet. The red numbers are (roughly, averaged) the time each step takes in milliseconds (1000 is one second). As you might notice, all numbers around the internet are huge compared to the rest. So the biggest thing you can do to speed up this kind of speed race has to do with optimizing your connection to Poloniex.

The easiest way to get a faster connection is to get a fast server close to Poloniex (and CloudFlare).  I've tried a lot of different servers hosted at AWS, Digital Ocean and Vultr. But eventually I moved Agent Smith from the Raspberry PI running in Ireland to a server hosted into the datacenter that Poloniex used before they started using CloudFlare (I'll leave this as an exercise to the reader Wink).

And of Agent Smith went, slowly increasing the success rate against Agent Jones of being the fastest Agent in the system.

Speed is not everything

But speed is not the only dimension of this game, there is another big one that has to do with fees: in the arbitrage example we were using a placeholder fee of 1%. On Poloniex you start out with a taker fee of 0.25%. However this number goes down the more money you trade on the exchange (in total per month). So in the example above we were trading roughly 1 bitcoin three times, this means that that roundtrip generated about 3 bitcoin in volume. Here is the fee schedule:

[Image: 38663557-488867e8-3e61-11e8-8e18-776b00a4efbc.png]

As you can see in the schedule, the more money you are able to move the less fees you pay. And when you are paying less fees you are making more profit on each roundtrip. But the really interesting part of this system is that having less fees means you are able to do more opportunities. Since opportunities that are not profitable with 0.25% might very well be with a 0.20% fee. This difference might not sound like a lot, but it is huge.

A note on Poloniex fees schedule: ever since I signed up for the exchange somewhere in 2015 this fee schedule has remained static. Back in 2015 a volume of 600 bitcoin was a lot less than it is today (with Bitcoin only having a price of around $300).

So throughout the second part of 2017 Agent Smith was slowly building up volume over time to get some fees discount, it never got all the way down the fee tiers but I am pretty sure no one is getting there. Their top tier requires moving roughly a third of all volume on Poloniex. Agents in that game are playing a different game, which is the art of making markets.

Agent Smith kept evolving

[Image: 38663579-5ad842f6-3e61-11e8-868d-dfb122d52422.jpg]

Even though the base problem (calculating whether swapping bitcoin->ether->usdt->bitcoin is profitable) is simple enough. There is an awful lot of complexity that comes with acting on profitable scenarios and making sure we constantly manage our liquidity (meaning, making sure we have anough BTC, ETH and USDT at all times).

After my dayjob (working on a blockchain project for a bank) I slowly kept tinkering and improving this frankenstein creation we now call Agent Smith. The beauty of this triangular arbitrage is that you can apply it to more markets besides ETH. After a few weeks I was trading on all USDT markets that crossed with BTC markets (such as LTC, XRP and a few others).

The initial version completely locked out while it was doing a roundtrip, meaning that if it was currently trading on an opportunity it would stop watching for new ones since the risk of spotting an opportunity that relies on the same order you already sent out a trade for is too big. Slowly overtime I upgraded this system to lock per market (not try to arb BTC/USD if a trade there is currently pending). This evolved into a system that locked on individual rate levels of one side of a specific market (BTC/USD ask @ 1001) and eventually would virtually assume only the pending order was taken already, ignoring a part of the orders that are currently in the orderbook.

Agent Smith vs the System

Exchanges are systems allowing us to play games (basically casino games some call investing), like any other system it has rules. This is all fine and dandy, however Agent Smith doesn't really care about rules. It only cares about one thing above all else: Being faster than agent Jones. So it towards a darker side of automated trading we turn, because as Morpheus puts it:

[Image: 38663742-ddade15e-3e61-11e8-8ff1-7c295ba2be43.png]

"Some rules can be bent, others can be broken."

The first trick is not breaking any rule, but it leads up to one of the main tricks up Agent Smith's sleeve.

> Poloniex error 422: Nonce must be greater than X. You provided Y.

When you create a program that does automated trading, you don't program it to use the website or a trading client. You use a specific gateway designed for programs called an API. Trading using the API on Poloniex requires your program to send an ever increasing number with every trade you submit (to prevent hackers from doing replay attacks). If you send an order you increase this number and Poloniex verifies if this is bigger than the last order you send. However we are sending three orders at the same time, and because of how these orders are routed through the internet and CloudFlare they might not arrive at the same time. Leaving us with the error above. As such Agent Smith didn't use 1 but a 10 different API keys (each with their own nonce counter). Problem solved! The nonce error went almost completely away, but this brought us to the next problem:

> Poloniex error 429: Please do not make more than 8 API calls per second.

How many Agent Smiths?

[Image: 38663790-08ce24ca-3e62-11e8-9b34-d4dac1522ba5.jpg]

[Poloniex if you are reading this: I apologise, I meant no harm. Please take the millions of dollars I paid in exchange fees as the formal apology.]

Before going into the error above, let's take one last dive into the blueprint of the race track: a long leg of the race is waiting for Poloniex to send us the updates to the orderbook over the websocket connection. And after getting a feel for the behaviour of Agent Smiths in various environments I found that Agent Smith sometimes performed a lot better than other times, seemingly randomly. What was changing? Measuring the server didn't turn up much and my code was performing very consistent. What about this websocket connection?

As stated before, computer systems can only do a single thing at the same time. So if there are a hundred Agents all listening for orderbook updates they are not going to receive this at exactly the same time. One agent will get the update before other agents: all Agents are in a list of connections and as soon as an update is ready messages will be sent to everyone in the list (one after the other). This is not just a problem with Poloniex, bigger stock exchanges try to work around this by offering (expensive) colocation hosting in a datacenter next to the exchange where they use very expensive hardware and a fiber cable of exactly 100 feet for each customer to guarantee that the messages arrive roughly at the same (we are talking microseconds or even nanoseconds here).

After reaching out to Poloniex around getting a faster feed (for example by getting a direct line bypassing CloudFlare) Agent Smith was getting hungry. The easiest way of trying to get the fastest websocket connection (per market) is to connect a ton of times and only keep the connection that sends the same message the fastest (and drop all the other ones). However Poloniex doesn't like it when you open a hundred websocket connections (they think you are DDoSing them).. You don't always get errors (like the one above), but you don't get the messages either.

So the solution: hook up different IP addresses to the server Agent Smith is running on (45 IP addresses to be exact) and rotate over them to create new websocket connections constantly and drop all the slow ones. Also use a pool of IP addresses for submitting orders, because the "8 API calls per second" is an IP limitation, not an account limitation.

The climax

[Image: 38663857-3b03cfc6-3e62-11e8-9eff-af5eeafd5ffc.jpg]

During the run up starting in november of 2017 the price was going insane and there were a lot of arbitrage opportunities for tri arb on polo. I created a tiny mobile dashboard that showed some key metrics, here is a screenshot:

[Image: 38663929-6ddf1efa-3e62-11e8-929c-3c6233c4a774.jpg]

Legend:

- delta: a rough measure of profit since last restart (19 hours ago).
- trades: the amount of trades (every roundtrip has a minimum of 3 trades) since the last restart.
- volume: the amount of money I moved since last restart. Yes, my tiny little frankenstein creation was moving around $10 million a day.
- last trade: the last roundtrip completed.

You want to see Agent Smith in operation? Here is a short video showcasing what Agent Smith was doing mid December 2017:

[Image: 38730336-46de2e5c-3f40-11e8-8082-e804ad461f9d.png]

(youtube link)

Unfortunately I have to report to my faithful readers that Agent Smith has gotten out of shape, for risk management reasons related to Tether USD (let's leave politics out of this) I reshuffled a lot of liquidity causing Agent Smith to slowly drop a lot of fee tiers. On top of that we have also seen a shifting landscape with Poloniex losing a lot of altcoin volume and new competitors like Binance (ref link) coming in to take over a lot of that volume.

But Agent Smith had a great run while it lasted, here is a performance chart (starting at a base level of a 100%). The blue line is the value of the portfolio due to the market going up and down, the red line is actual value due to arbitrage. This screenshot was taken just before christmas: the markets were down a lot, but Agent Smith was able to arbitrage its way out of the whole debacle with some nice returns (around 45% in this particular week arb and market combined, admittedly this was not an average week).

[Image: 38664217-25d603b6-3e63-11e8-9b08-9164369f5900.png]

Part 2

Agent Smith was the first system of its kind I build, but definitely not the last. I won't tell much about what I'm running now except that it has a just as awesome name: KNIGHT RIDER, here's a very cheezy teaser:

[Image: 38664251-3d64c8dc-3e63-11e8-8c47-67a9de910004.png]
I won't say too much just yet, except that it is trading futures on Bitmex (ref link).


Let this be a lesson for all explorers out there, and let us remember Agent Smith in his glory days.

Are you interested in the world of automated trading on crypto markets? Have a look at the platform I am building called Gekko Plus. If you subscribe to the newsletter I'll be sure to send out more stories like this as soon as I write them.
  Reply
#2
Awesome story !


A few month ago, when I started to have a look at cryptos, I also tried this kind of arbitrage, on Binance.
But I was not as involved as you were, and stopped after my first tests Sad

Looking forward for your next adventures !
  Reply
#3
it was wonderful
it was very super

https://gekkoplus.com/
  Reply
#4
Congratulations, my friend, great story, perfect analysis.
we are waiting for gekkoPlus to have glory days

ass: Willian, Brazil
thank you
  Reply
#5
Hi askmike,

A friend of mine recently pointed me to this excellent post and I was immediately struck by how similar it was to my own experience!  So I though i'd share a little about my own journey down this particular triangular shaped rabbit hole...

I only really started looking into crypto trading ideas in mid-Jan this year after a different friend has asked me to help out with some exchange connectivity for another project he was working on.  I'm a developer by trade in the financial world and not having had an experience in crytpo-land I was keen to give it a whirl - always fun to learn new skills!

Being a naturally cautious person I was drawn toward arbitrage strategies instead of the usual alpha generation approaches, what can be better than making money risk free right?  After extensive in-depth research (ok, I spent about 20 mins googling it one evening) it was pretty clear the bulk of arb going on in the crypto space (or at least what people were prepared to talking about publicly)  was all about exploiting price differentials between exchanges for common pairs like BTC/USD.  This didn't seem like a very good approach to me for all the same reasons you described.  

As well as this my working assumption was that all of the exchanges were crooks and would use every trick in the book  (front running, spoofing etc) to ensure that these simple and common approaches would not be profitable.  Given that I decided to try for a less common arb strategy and so I decided to give triangular / three-way arb a try.

I decided early on that I wasn't going to get into the ultra low latency / co-location game as I was confident that wouldn't be a battle I could not win given the exchanges can (and likely do) run such strategies themselves directly with far lower latency access and without fees.  Given this, and for simplicity, I stated hacking out some code in Python.  As it turned out with some smart data structure choices (i.e. maintaining a realtime orderbook in O(1) for most cases) and by running my code 'near co-lo' (i.e. on the same AWS/DO datacenter as the exchange) I was able to achieve decent processing times and low-enough latency.  Ultimately the biggest challenges I faced were not related to performance - but I'll come on to those shortly.

I initially coded up exchange APIs for Poloniex, GDax and HitBTC.  This brought the first significant challenge to light; every exchange uses bespoke and often bewildering conventions, many are poorly  documented and I found more than a few  bugs.  

For example Tether is confusingly called 'USD' on HitBTC rather than 'USDT' as used by most other exchanges.  There is no standard (or even de-facto common standard) convention for ticker pair naming (is it BTC-USD or USD_BTC or BTCUSD or "345"?).  I especially like the exchanges who name pairs without a delimiter when they list currencies whose codes are variable length! (BTCSTEEM anyone?).  

The solution is obvious - if tedious - develop a common standard for the bot to use and adjust all the calls in and out of the exchanges to account for their individual conventions and quirks.  In my opinion a crypto exchange API standard is badly needed given the number of exchanges out there - and no I don't mean use FIX!

Even better is Poloniex which, as you will have seen, don't use their own published websocket API (WAMP protocol) but instead use an undocumented API which you have to reverse engineer with the help of forum posts scattered over the internet - anyone would think they didn't want 3rd party bots connecting to their service Smile

Then there were the bugs.  I'm genuinely surprised how many bugs I encountered in the exchanges given how widely they are used.  One of the bugs I found on GDax is potentially very serious.  I filed bug reports like a good boy and they were of course duly ignored.  Luckily I was able to workaround them mostly.

The next interesting challenge was how to accurately measure latency to the exchange as I would need to bot to stop executing if the latency grew too large for any reason (i.e. network or OS issues or perhaps simply because my bot Python code was to slow).  Specifically I wanted to know how long it takes for a message to get from the exchange to my bot for processing.  

Measuring the 'ping' time is obviously a useful starting point but I really wanted to understand how 'old' a given message is when I processed it given the various buffers and queues it will go through both within my code and OS network stack and outside on the network and even on the exchange itself.  Luckily HitBTC provide a fairly accurate 'timestamp' on the ticker messages which indicates when the message was sent and so I was able to measure how old each message was when I processed it.  Surprisingly (or perhaps not) not every exchange has a way to do this and in fact Poloniex does not tat I could see (at the time I wrote it at least, I haven't looked recently).  I would be interested to knowhow you approached this problem?

Eventually I'd developed all this into a fairly solid async-only exchange-agnostic API  (operations not available via websocket are run as async GET/POST on a thread to keep everything event driven) which could be plugged into any bot 'strategy' such that the strategy code didn't have to care about any of these exchange specific things and could focus on the algorithm itself.  All fairly standard engineering type stuff.  I may open source this pat of the code at some point as it is genuinely quite useful.

With that in place I (along with my friend who had dragged me into this world!) then wrote a very simple python bot which would do the obvious thing, listen to ticker bid/ask prices (not orderbook updates initially, I switched to that later when the 'liquidity'  penny dropped - more on that later)  over the websocket and then do the math and execute (simulated orders) the profitable ones after accounting for fees & expected slippage.

One of the great thing about the crypto world is that there are so many currency pairs that the number of three-way arb possibilities is huge!  Rather than targeting a specific three-way triangle (say, USDT->BTC->ETH->USDT) I coded it to check every possible three-way permutation on every market tick!  This took a little optimising to keep the cost of calculation down as there could be ~2000+ permutations on some exchanges which support a large number of currencies.

This then brought up the next challenge; getting the maths right!  It sounds easy in theory but I found it surprisingly difficult in practice.  Why?  One explanation (other than me being stupid and tired - this was an evening and weekend pet project) is that in crypto land we're always dealing with tiny fractional numbers (i.e 0.000001 of currency X is worth 0.0000001 of currency Y) and so it is very hard to get an intuitive feel for when something is right and wrong.  This is especially hard when you're trying to automatically deal with hundreds of currencies and thousands of three-ways!  

Another complexity here is that to complete a three-way opportunity you typically have to buy two ticker pairs and sell a third so their is some further mental gymnastics involved.  For example to execute USDT->BTC->NXT->USDT you would have to first sell USDT to get BTC but the actual exchange ticker is BTC_USDT (BTC is base currency, USDT is quote currency) and so you would instead have to buy the BTC_USDT ticker.  You'd then (in parallel as you note) go on to buy NXT_BTC and sell NXT_USDT to complete the roundtrip.

This stuff does make you question your sanity at times!

After running simulations for a while against a few exchanges I soon zoned in on HitBTC as it seemed to have the magic combination of low taker fees and a high frequency of profitable three-way arb opportunities, largely I think because of the huge universe of tickers it allows you to trade.

I  found that Poloniex was throwing up the occasional  arb opportunity but most lasted for less than 100-200ms and it was clear that someone else was running the same strategy as me at the same time (so thanks for that!).  Interestingly on both Poloniex and HitBTC I found that for very small opportunities (i.e. <0.1% profit after fees) the arb window lasted much longer suggesting that my competition were going after the bigger moves and so there was some scope for 'bottom feeding'.

Then came the first 'oh s**t' moment.

Until now I'd be looking for arb opportunities and simulating based on a fixed notional amount at the current best bid/offer prices.  So for example I'd start with 100 USDT and simulate a three-way arb via two other currencies and end up back at USDT.  Do I end up with more than I started with after fees?  If so - great!  We have a winner.

The (now gapingly obvious to me) flaw here is of course that there may not be sufficient liquidity at the best bid/offer price to execute all 3 pairs for the notional amount.  To solve this I moved away from trying to roundtrip a fixed notional amount and instead worked out the 'highest common liquidity' between the 3 ticker pairs involved in the three-way transaction.  

To do this I had to move beyond the simple feed of ticker prices (which contain only the best bid/ask, not the quantities) and instead consume the full orderbook so as to have access to the quantity of orders available at the best bid & ask.  This took a little work as I wanted to ensure I could maintain the orderbook efficiently and so avoided simple ordered lists or binary trees and the like.  Ultimately a simple solution would have been fine given ultra-low latency was never the goal, but still...

What I found is that after doing this the arb profitability reduced significantly as it turns out that for many of the currencies with regularly appeared in profitable three-ways has very poor liquidity on the orderbook and so this limited the 'highest common liquidity' I could transact across the three-way.  Typically I found the executing an amount of around ~0.01 BTC was possible for two of the three pairs only.

Then came 'oh s**t' moment #2.  

When simulating I could quite happily buy and sell tiny fractions of a coin such that the amount I was buying or selling in each pair was roughly equal.  For example if the 'highest common liquidity' between the chosen three currencies was say $0.001 USD then I would buy / sell this much of the tickers in the three-way.  

However in the real world the exchanges have minimum order quantities and minimum order quantity increments (sometimes published, sometimes not and discovered by trial and error).  After some analysis it turned out the on HitBTC the minimum order sizes, in $USD terms vary dramatically between pairs (magnitude different).  Most notably the minimum order size and minimum increment for BTC based pairs is 0.01 BTC which is a coarse grained number.  Essentially it means if the 'highest common liquidity' was 0.015 BTC then my choice is either to trade 0.01 BTC or 0.02 BTC.  

The upshot of that is that it makes performing genuine three-way arb essentially impossible as the amount we trade in each pair is significantly lop-sided.  I think this is an artifact of history; when HitBTC first set the minimum quantities they were likely all roughly inline but over time as prices have moved dramatically and so they have drafted apart and have not been reset.

This is the point where I decided to pause and moved on to other more tantalising strategies rather than continuing with this strategy on a different exchange.  It has served as a great learning opportunity and given me the foundation I needed to go on to work on some more interesting things - i'll leave that for another day!


Quote:But the really interesting part of this system is that having less fees means you are able to do more opportunities. Since opportunities that are not profitable with 0.25% might very well be with a 0.20% fee. This difference might not sound like a lot, but it is huge.
I couldn't agree more! Fees are a huge limiting factor for any market-taker based strategy.  I didn't ever execute enough volume to move down the fee table but it's clear that is the way you have to go to unearth the profitable opportunities.  Whoever said size doesn't matter was wrong Smile


Quote:The first trick is not breaking any rule, but it leads up to one of the main tricks up Agent Smith's sleeve.


> Poloniex error 422: Nonce must be greater than X. You provided Y.
I also came across this issue on Poloniex but, as I was largely targeting HitBTC at that point, didn't dig into it much so thanks for saving me the effort!

Thanks again for posting this, a great read and also nice to hear of other peoples experiences of coding trading bots in this magic-internet-money world.

Good luck with your index arb bot Smile
  Reply
#6
Nicely done review of the arb bot world. I have passed this on to others in the space. Thanks for sharing!
  Reply
#7
Awesome story !

PD: Agent Jones = FujiApple Wink
  Reply
#8
(04-22-2018, 09:59 AM)FujiApple Wrote: I decided early on that I wasn't going to get into the ultra low latency / co-location game as I was confident that wouldn't be a battle I could not win given the exchanges can (and likely do) run such strategies themselves directly with far lower latency access and without fees.

I know this is commonly believed and it does make sense that they are doing it themselves. But everything I have seen so far points against this (on a few exchanges anyway). What I've seen for exchange infrastructures so far is that different matching engines (for different markets) run on different servers (or threads, probably servers on big exchanges). Meaning that if they want to do this themselves they need to monitor their other orderbooks async and maybe even do some kind of locking. I am confident the risk of slowing down their matching engine is much bigger than the money they would make here.

In perspective: I've made some profit, but I paid at least 10 times that amount in fees. So on exchanges with decent fees they'll make a ton of money anyway. Why risk a lagging metching engine for an extra 10% (max during crazy bull runs)?

(04-22-2018, 09:59 AM)FujiApple Wrote: Measuring the 'ping' time is obviously a useful starting point but I really wanted to understand how 'old' a given message is when I processed it given the various buffers and queues it will go through both within my code and OS network stack and outside on the network and even on the exchange itself. Luckily HitBTC provide a fairly accurate 'timestamp' on the ticker messages which indicates when the message was sent and so I was able to measure how old each message was when I processed it. Surprisingly (or perhaps not) not every exchange has a way to do this and in fact Poloniex does not tat I could see (at the time I wrote it at least, I haven't looked recently). I would be interested to knowhow you approached this problem?

Though if the API does not offer such an API call, pinging the host of the API does not work in my experience (as you are simply pinging cloudflare/their CDN/their loadbalancers). I usually measure different API calls over a period of time as well as other metrics (in the case of AGENT SMITH: success metrics: how often was I able to take crossing orders from the book before anyone else).

(04-22-2018, 09:59 AM)FujiApple Wrote: Eventually I'd developed all this into a fairly solid async-only exchange-agnostic API (operations not available via websocket are run as async GET/POST on a thread to keep everything event driven) which could be plugged into any bot 'strategy' such that the strategy code didn't have to care about any of these exchange specific things and could focus on the algorithm itself. All fairly standard engineering type stuff. I may open source this pat of the code at some point as it is genuinely quite useful.

Please do! I would be very interested in this. I am actually doing something similar myself with Gekko Broker, see here: https://github.com/askmike/gekko/tree/ge...kko-broker

(04-22-2018, 09:59 AM)FujiApple Wrote: The (now gapingly obvious to me) flaw here is of course that there may not be sufficient liquidity at the best bid/offer price to execute all 3 pairs for the notional amount. To solve this I moved away from trying to roundtrip a fixed notional amount and instead worked out the 'highest common liquidity' between the 3 ticker pairs involved in the three-way transaction.

Yes, you always want to want the orderbook (at least a few levels down) instead of the ticker for this exact reason. The other big reason is that if you can (quite complex to do) aggregate a few levels down (say take both the top ask and the second ask) to take more volume. Agent Smith was doing this for a while, but it becomes harder to track whether you were fast enough:

If these are the top asks:

10 @ 100
1000@ 101

And you want to arb 20 for an average price of 100.5 (by taking the first ask and 10 of the second).

You can do this in a single order (preferable), however figuring out whether you were fast enough becomes harder since if you posted a limit order to buy 20 at 101 it would get filled even if the top ask is gone. (for a price where than you projected).

(04-22-2018, 09:59 AM)FujiApple Wrote: However in the real world the exchanges have minimum order quantities and minimum order quantity increments (sometimes published, sometimes not and discovered by trial and error). After some analysis it turned out the on HitBTC the minimum order sizes, in $USD terms vary dramatically between pairs (magnitude different). Most notably the minimum order size and minimum increment for BTC based pairs is 0.01 BTC which is a coarse grained number. Essentially it means if the 'highest common liquidity' was 0.015 BTC then my choice is either to trade 0.01 BTC or 0.02 BTC.

Yes this is a big problem, I didn't go into detail into the article but it all becomes pretty ugly very soon. Luckily on poloniex specifically this is not really a problem since their minimums are tiny: eventually I settled on a simple filter that would not arb opportunities smaller than something like 15 bucks. Not just for the reason above, also because you don't want to arb 10 dollars if you could wait one tick to arb 1000 dollars over the same market instead.
  Reply
#9
[ deleted spam ]
  Reply
#10
nice read. really interesting.
  Reply


Forum Jump:


Users browsing this thread: