CALIBRATED CONFIDENCE: A Chat With "The Man": An Interview With Nanex Founder Eric Hunsader

In this interview for the HFT Review, Mike O’Hara talks to Eric Scott Hunsader, Founder of Nanex LLC, the US data feed company, about the explosion of data in the US financial markets, conflicting reports around the causes of last year’s “Flash Crash”, some problems with current academic research into HFT and how to improve the signal-to-noise ratio in today’s electronic markets.

HFTR: Eric, Nanex has become known in the last eighteen months or so for its extensive research into US market events and phenomena. But can you give us a quick overview of what is Nanex’s core business?

EH: We’re a feed aggregator. We take in the feeds from all the US equities, options & futures exchanges, we normalise that data, add a very accurate timestamp on it, compress it about 20-1, which is really what gives us our niche, then we send real-time, delayed and historic data out to subscribers.

HFTR: Who typically are your customers? Retail investors? Institutions? Prop traders? Hedge funds?

EH: We have a little of each, and the reason for that is that we’ve never really gone out and marketed or advertised, we kept things quiet for a while. I’ve been doing this since the mid-eighties when I was collecting S&P 500 futures data on floppy disks and selling it on a BBS. In the mid-nineties, I developed a real-time streaming charting service and partnered with with Quote.com. There, we went from zero to about 10,000 paying subscriptions in about fourteen months and it was one of those overwhelming growth stories where it was impossible to do anything further with the software because of the huge influx. So I wanted to grow this one quietly. People started trickling in through word of mouth, from all areas of trading, because we didn’t specifically target any one group. We supply data for Ameritrade, we’ve got companies like Peak Six who use us as a backup or a secondary feed, we’ve got prop shops, exchanges (the NYSE is reselling our OPRA options data for example), and we have retail guys who trade out of their houses.

HFTR: I originally came across Nanex when I stumbled across your “HFT Crop Circles”, which is an interesting analogy of patterns you were seeing in market data as a result of algorithmic activity. What was it that caused you to start looking at these patterns and commenting on them?

EH: It was the Flash Crash, the lack of any movement going on and the fact that we had all the tools in place to really make a couple of grand sweeps through the data and put together some random thoughts to flesh out what was really happening. When we did that, we noticed that some stocks had thousands of quotes per second with no trades, and we’d never really seen that before.

HFTR: So if we look at the Flash Crash specifically as a starting point. There’s been a great deal written about the events of May 6th last year and there are various opinions about what actually happened, some better informed than others. The SEC and the CFTC came out with their joint report on October 1st last year and then you came out with your own report, which recognized a lot of what the SEC & CFTC were saying, but your analysis indicated there was more to it than met the eye. Can you take us through that? What did you see that the SEC and CFTC missed?

EH: They described everything that happened either after the bottom was reached, or as a result of that big drop. Nothing they talked about was unique or had anything to do with what actually tipped the market over and got it snowballing down hill. They talked about stub quotes for example, but if you simply look at the data, you’ll find that the overwhelming majority -- really, any that mattered -- executed after the bottom. Same with LRPs (Liquidity Replenishment Points). So all of the things they focused on were problems with the market after the crash, which isn’t the same thing as discussing what caused the crash. They’re two different things.

We did put our initial analysis out a couple of weeks after the May SEC/CFTC report and we were thinking about producing a follow-up after theirs was published. Then out of the blue, Waddell & Reed sent us all 6,438 eMini trade execution reports, which totalled 75,000 contracts.

HFTR: These were the executions highlighted in the SEC/CFTC report as being what triggered the downward move, correct?

EH: That’s right. The SEC/CFTC report stated that the execution algorithm used by the “large seller” (i.e. Waddell & Reed) gave no regard to price or time, but once I started looking through the data, it became pretty clear that the execution algorithm did in fact use both time and price, so I started wondering whether the SEC & CFTC really looked at this data? That led me to asking questions about the algo from the guy at Barclays, the execution broker for Waddell & Reed, who verified the trade executions sent from W&R. When the question came up on why Barclays didn't clearly explain to the SEC & CFTC during the investigation interview that the execution algo does in fact use time and price, I received the stunning answer: “They never interviewed us”

After analyzing those trades that Waddell & Reed gave us and talking to Barclays, we wrote our own report stating that this algorithm was clearly mis-characterised, there’s no way that the algo the SEC/CFTC talk about in their report matches up with this at all.

The very next week, the CFTC invites Barclays to a meeting (and this is on record on the CFTC website), which turns out was the first time the algo was discussed in detail. So first they finger Waddell & Reed and paint this execution algorithm as something that went wild, and then they finally interview the guys who actually executed it!

HFTR: So can you take us through your analysis of what happened?

EH: Well, first of all, that algorithm only sold on the offer side, it never hit the bid. So it couldn’t really push the market down. Of course it could place a lot of weight on the offer, but it would turn off randomly for 30 seconds at a time, making it hard to detect. In fact, looking at the CME time & sales data and having seen Waddell & Reed’s trades myself, thinking how I would deduce that was a single entity, there was no way that thing was detectable, unless somebody else had some extra information. It was pretty well hidden. Most orders had small size; the average was around ten contracts per trade.

When we looked at the market meltdown that afternoon, at 14:42:44, there was this event in the e-mini in Chicago where somebody sold a few thousand contracts right through the book, like “I want to sell these now!” At that exact moment in New York (and not 14 milliseconds later, which is how long it takes light to travel from Chicago), somebody sold an equal amount of SPY, QQQ, MMM, DIA, all the big index ETFs that cover the market. There were also a number of large cap stocks that got hit too. So this wasn’t an arbitrage reaction to what was happening in Chicago, it was simultaneous selling -- had to be the same seller, or a fantastic coincidence. When that event went off, pretty much every ETF and index, and all their components, plus all of the related option chains re-priced across the board, saturating every data feed, both CQS and UQDF, as well as the premium direct feeds.

That was the moment where the Consolidated Quotation System (CQS) hit saturation point, where OpenBook (the NYSE direct feed) hit the highest message traffic for the day. From then on, it actually declined significantly. At that point, the market still had another 600 points to go down yet, so we hadn’t entered the bizarre “hot-potato” phase yet. Something caused that surge in quote traffic, even on the direct feeds, to hit levels that gave people pause.

After the Flash Crash and after we’d identified that point at 14:42:44, we looked for dates with similar events and April 29th popped up, where a similar thing had happened. Somebody went off and sold a bunch of futures contracts and at the same exact time hit all the ETFs. So then we went back in time, we went back through our database to 2006 and found that this was an extremely rare event. We would see it every once in a while on Fed announcement days or key news points, but never just out of the blue like that.

So we thought maybe someone was testing out a new algorithm that had found a way to be ahead of the arbitrage. Because if you’re selling a large amount of e-Mini futures, you know what’s going to happen in New York, it’s a given. If the futures drop here in Chicago, you know the ETFs in NY will drop too. What if by the time the orders responding to the event in eMini’s in Chicago got to New York (which takes about 14 milliseconds), you’d already cleared out the ETF (SPY, QQQ, IWM, etc) books? That’s when it dawned on us! Wow, that’s a bold strategy! So we labelled that thing the disrupter and we’ve been monitoring for those specific types of conditions ever since that time.

HFTR: Since then, how regularly have you seen this kind of event occurring?

EH: Whenever there’s a Fed announcement, whenever there’s a big news day, every so often it pops up but it doesn’t usually get out of control. On the 6th May 2010 however, we saw it hit three times in a row: 14:42:44, about three seconds later and another three seconds after that. Since then, we’ve never seen it run three times back-to-back like that, but in August of this year it started to occur more frequently. The monitor we have for it makes a little sound when it goes off, and that thing started going off all the time! It’s gone from maybe once a week, to once every thirty minutes or so, to all of a sudden pretty regularly, every couple of minutes.

HFTR: Does this kind of thing constitute market manipulation in your book?

EH: Absolutely!

HFTR: In that case, what do you think the SEC & CFTC should be doing about it?

EH: Well, I pulled the exact sequence numbers of the orders and trades that were involved in both the April 29th and May 6th events and I sent them to the regulators. But I never heard back. To be honest with you, they never really seemed that interested.

HFTR: How concerned are you by that? Are the regulators taking your research seriously, do you think?

EH: To be honest, when I went out and visited them last year, my conclusion was that either they were very behind the times and not very bright or they were completely owned by Wall Street. I couldn’t understand why they wouldn’t even look at this. Some of the things we sent them while they were working on their Flash Crash report, they never even read, which doesn’t make any sense to me. If somebody really wants to find the cause, why would you discard a lot of free work and in-depth data? At least look at it!

HFTR: Are there other firms or maybe academics backing up your data with similar work?

EH: Well, we always hoped that academia would catch on and now they actually are, in fact Jonathan Brogaard was just at my office the other week. And I was over at the Foresight round table in the UK recently, where I was able to talk to and give our data to a lot of people in academia. It’s just taking time. A lot of the things we’ve found will be replicated and slowly people will realise that maybe we weren’t just out there with “deep throat” conspiracy theories that the exchanges tried to paint us with initially.

The thing is, I don’t publish anything unless I’m absolutely sure that I’ve checked everything, I can back it all up with hard data and I know for a fact that our statistics are solid. We haven’t retracted a single thing we’ve ever published. And we don’t like to draw conclusions, we prefer to just throw out what we know and let somebody else connect the dots. However, we’ve since realized that for a lot of the media, you really have to walk them through it.

But some things are just so blatantly obvious, like this whole quotes per second business. We now see stocks that regularly have 10,000-plus quotes per second. Which means you can’t have more than a few dozen symbols before you tip the whole system over. And it just doesn’t make any sense any more.

HFTR: Is there any particular academic work you think is of value (or otherwise)?

EH: Well, in one of your recent interviews you spoke to Professor James Angel of Georgetown University, and in that interview when you were discussing the SEC/CFTC joint report on the Flash Crash and our own findings, he talked about his “deep throat sources not being able to substantiate the Nanex allegations”. Then on the very next page, he’s in fact saying exactly what we said, using our exact “allegations” to support what he said happened on that day. That it just bizarre and doesn’t make any sense at all!

For example, he talks about the fact that the feeds got delayed, which caused firms to pull out. If I can take you back to June 23rd 2010 when we published our first report on this, nobody thought that any exchange feeds were delayed or had problems. In fact, Larry Liebowitz of the NYSE testified before Congress (twice I think), saying that there were no system problems. So when we came out with our findings on June 23rd, people were saying we didn’t know what we were talking about, there were no system problems, etc.

It wasn’t until September 2010 that the NYSE finally admitted that yes, their systems did have problems and got delayed up to 30 seconds on about half of the symbols they were trading.

The thing that’s very troublesome is that some of these academics are absolutely clueless about what’s going on in the marketplace. So much has changed in trading data just in the last three to six months. What used to unfold over the course of a second, now unfolds over the course of 25 milliseconds. For example, we dissected one YHOO trade recently that looked like a bad price spike, but when you dig down into it, you find that there’s a ton of orders and trades there, which drove the stock up a good seven or eight per cent in less than 20 milliseconds. And the timestamps make it appears as if trades were executing before the quotes that could have caused them!

One of the other problems with a lot of the academic studies is because the data is so overwhelming, they tend to try to get manageable samples that they can work with, so they’ll look maybe at one-second samples or longer. The problem is, the way trading happens today, in one second there might only be 25 milliseconds that has anything you really need to look at, and the other 975 milliseconds is nothing. So when you average together that whole second, you’ve diluted things by about 40:1.

So when these stocks go rocketing in a 25 millisecond period of time, something like SPY, which is usually just a penny spread, might be three or four cents during this period of time, then will come right back to a penny again once it’s all done. If you look at the one second average, it’s a penny. If you look at it when it’s really active, it’s three or four pennies.

When you see these papers that say spreads have narrowed for example, every single one of them is based on data that is a) either at least a couple of years old and/or b) sampled over such periods of time that all of the valuable information within is lost.

HFTR: Are you willing to offer your data out to the academic community to help them produce more accurate findings?

EH: Yes, definitely. Any time a new paper comes out, I’ll contact the author and first ask what data set they used and then I’ll ask if they’re interested in bringing their paper up to date, if they need help with data, whatever they need so they can focus on what’s really going on in these really short periods of time.

We make the May 6th data set available free for whoever wants it. Every few weeks we get either someone who’s working on their PhD or maybe a professor who has published before who wants more information.

A lot of this data we have to look at anyway in the normal course of our business, because customers will have questions, but our goal is to get academia to take over all this research that we’re doing and to never have to publish a research paper again!

HFTR: You clearly think that there is a lot wrong with the microstructure of the US markets right now. What can be done to fix things?

EH: One of the biggest things that we think needs to occur is this whole business of properly time stamping the data at the earliest part in its lifecycle. If they just kept the timestamps on when the quote or trade was originated, that would go a long way to solving lots of problems. A lot of things would be acceptable because you would at least be able to detect exactly what the delays were at any given point. Right now, it’s impossible for anyone to know what those delays are; you have to be able to just estimate them based on traffic flow etc, because CQS doesn’t put the timestamp on until they finally get the quotes out of their system, ready to be transmitted. There can be a massive difference between when the data is originally generated and when it goes out. And that difference crops up all the time on very short timeframes.

HFTR: You recently published a proposal entitled “Coexisting without Colocating – A Proposal for Improving the Market for Everyone”, which advocates two things: i) keeping the original exchange-generated timestamp of the quote instead of overwriting it with the CQS timestamp and ii) creating a new quote type called “immediate”, which doesn’t update the NBBO, so the NBBO is only updated by quotes that remain in force for a minimum period (e.g. 250 milliseconds). Can you explain your thinking behind that proposal?

EH: Certainly. I’m not against high frequency trading, I’m actually a very pro-technology kind of guy. I’m just against high frequency noise and quoting, this whole business of putting out quotes in order to fool or manipulate others into trying to expose other systems or working algorithms. A lot of this high quote traffic is all about trying to manipulate or fool other systems, to cause the other systems to do something that would expose their hand. That is the kind of behaviour that needs to get nipped. It’s fine if somebody does that every once in a while but it’s not fine if everyone is doing it, because that’s what’s behind the explosion in quote traffic.

It’s this business of changing the quote but not even changing the price. So let’s say that the best bid is 100.00, the best offer is 100.01, and the size of the best bid is 50 shares, then 55 shares, then 53 shares, then 51 shares, etc, ten thousand times a second. With that kind of behaviour, the incremental cost to the sender of those bids is minimal. It’s not particularly difficult to write software to make that happen, it doesn’t take a lot of “horsepower”, it’s actually very simple from the sender’s perspective. But with the millions of subscribers who take in CQS every year, every single one of those downstream systems has to waste CPU cycles to look at this stuff. So if you’re putting out a quote that you have no intention of executing on, you have no business being hooked up to the exchange and the NBBO.

It’s high frequency spam. There’s very little cost to the sender but there’s a huge cost downstream. And the thing about spam is that nobody thinks it is a good thing except for the spammer. If we had accurate timestamps on all the quotes coming through the NBBO, and you knew that quote was going to be there for say 250 milliseconds from that timestamp, and the only reason you weren’t going to be able to execute on that quote was because somebody else beat you to it, that would significantly change things for the better. We would get our diversity of participants back in the market.

When everything is about speed, you lose a lot of diversity; you don’t have all those different players with different view points using different algorithms and with different strategies at play. You end up killing all of them, it’s all shoot first and ask questions later.

The one thing that really needs to be addressed urgently is this business of timestamps. People say it’s too difficult, but that is such utter nonsense. Accurate time stamping is a solid science that has a long track record; I’ve been doing it myself for at least a decade. It’s reliable, resilient, it doesn’t cost a lot to implement, and it will immediately allow people to detect exactly how things are delayed.

That will go a long way to people not feeling so manipulated, because now they will be able to take action against it. It’s this whole business of saying, “it’s too hard” or “it will be too expensive”, which is something I’d really like to have an open discussion about.

HFTR: Maybe we can get such a discussion going here at the HFT Review. Thanks Eric.

CALIBRATED CONFIDENCE

Pages

Wednesday, October 26, 2011

A Chat With "The Man": An Interview With Nanex Founder Eric Hunsader