Insight

Introducing Model Version 3 of Statistical Ratings

June 26, 2024

Statistical Ratings are part of Agio Ratings' product offering.  The ratings use publicly available time series, such as on-chain reserves, self-report trading volumes and age, to predict the chances that a counterparty will default within the next twelve months.  They were first published for a group of centralized exchanges in July 2022.  Back then we somewhat controversially estimated FTX at just under 10X riskier than Coinbase because it was smaller, younger and based in the Bahamas.

Since that first publication we’ve been working to improve the predictive accuracy and coverage of the ratings.  We are therefore excited to announce the release of Model 3 (v3).  V3 represents a significant upgrade on v2.  For example, it’s trained on twice as many defaults, has twice as many explanatory variables and covers 50% more counterparties (see appendix for the full listing).  The purpose of this post is to provide more details on v3 and how it has improved on v2.

Explanatory Variables

Figure 1 gives an overview of the explanatory variables.  V3 has inherited all but one of the variables used in v2.  We dropped coin-pair diversification, a self-report measure of how exposed a given exchange was to a restricted number of markets, since it was comparatively less predictive.  Meanwhile the seven variables we carried over have all been further optimized to improve their accuracy.  For example, we’ve retained average on-chain reserves as an explanatory variable.  But we’ve adjusted the window used for averaging. A shorter-window isn’t sufficiently stable, a longer-window reacts too slowly.

Then, to these seven variables, we’ve added another ten, some drawn from entirely new explanatory variable categories. For example, the rating now contains two variables that track operating leverage, in this case the amount of business a firm writes relative to its size.  Likewise, there’s new variables that capture growth rate and the macro-environment.  This last factor helps to calibrate the default forecasts to the aggregate actual outcome over time.  In general, there were many more defaults in 2022 than 2023.

Figure 1: Predictive Variable Overlap

Predictive Performance

As noted above, v3 has been trained on twice as many defaults.  Over time we’ve unearthed more data on prior defaults and there’s been some new additions along the way.  One method for measuring discriminant analysis performance is the Area under the Curve (AuC) of the RoC graph[1]. An AuC of 1.00 is a perfect model that’s always correct and an AuC of 0.50 indicates a random forecast that performs no better than chance. The typical score of a credit risk model used in TradFi is in the 0.65 – 0.85 range.

Against this metric v3 offers a significant improvement over v2, with an AuC score of 0.77 compared to 0.71 for the earlier ratings.  Furthermore, v3 retains the characteristics which demonstrated v1 and v2’s ex-post validity in the years since they were first published.  Firms that experienced default events such as FTX, Bittrex and Gemini -- which were all called out by v1 and v2 -- also attract riskier ratings in v3.  Likewise, Binance US’s v2 rating deteriorated substantially after the CFTC’s announced its lawsuit on March 27, 2023, effectively shutting down those operations, while Binance’s rating wasn’t as badly affected.  V3 replicates that outcome.

Figure 2: Coinbase Cross-Validation

Another test of v3’s performance is whether it’s cross-validated by observable market forecasts.  For example, Coinbase has both publicly traded bonds, whose yields contains a credit component, and equity.  This equity price also contains an implied default risk.  Given a firm’s debt levels, the higher the market capitalization, the higher the enterprise value, and the lower the default risk[2].  It has been argued that the equity price actually contains more default risk signal than the debt yield since, given the seniority structure of a firm’s liabilities, equity is the most exposed asset. There’s also typically more trading depth in a firm’s stock price, with more traders and greater volumes.

Figure 2 shows how v3’s default forecast has tracked Coinbase’s equity price over time with remarkable accuracy. The 1-year probability of default measured by our Statistical Ratings tarted at around 4.0% and then started to improve towards the end of 2023, eventually improving to 2.5%.  This closely tracks the inverse of the share price measured by -ln(equity price)where ln is the natural log (so -4.0 is equivalent to e4, about $50 per share, and -5.5 is $250, a higher stock price and therefore lower risk).

The day-to-day correlation between these two timeseries, with an overall R2 of 85%, is outstanding and strongly corroborates v3’s higher accuracy.  The relationship between the v3 timeseries and Coinbase’s bond yield credit spreads is also extremely tight, with an R2of 80%.  And the implied annual default risk in those bond spreads has averaged around 3% over the past two years, providing cross-validation of the Statistical Ratings’ calibration to real-world default rates.

Ratings Comparison

The accuracy of v3’s calibration is also demonstrated by Figure 3.  This graph contrasts the average exchange default risk according to v2 and v3. Overall, including the exchanges with >20% default risk not shown, the R2 between v2 and v3 is 33%, consistent with the inclusion of an additional 10 explanatory variables in v3. But, importantly, the overall relationship is close to a 45° line, particularly when allowing for the inclusion of macro-factors in v3.  Since v2 and v1 are demonstrably well-calibrated to actual observed default rates over the past two years then, by extension, v3 is also accurate.

Figure 3: Mean Default Rates

Finally, Figure 3shows how v3 and v2 agree on the riskiness of the main centralized exchanges while diverging where it makes sense. Coinbase has an average default risk of 3.3% in both ratings.  Likewise, Binance and Crypto.com have the same average rating between versions.  Meanwhile,OKX, Huobi, Bitpanda, Gate.io and Bitstamp’s ratings remain similar.  Conversely, v3 has adjusted the riskiness of two exchanges that clients have questioned. Specifically, several users thought v2’s ratings for Coinspot and Kraken were too generous.  V3 rates these exchanges as riskier, with 9.7% and 7.0% annual default probabilities

Conclusion

It's taken about a year to assemble more data, explore many thousands of explanatory variable options and converge on a new version of Agio Ratings' Statistical Ratings. The resultant v3 has several advantages over the previous version:

- Replicates: v3 retains the positive features of v2, such as accurate calibration and an intuitive response to events like Binance’s lawsuit, by including almost all of v2’s explanatory variables.

- Improves: v3 has 10 new variables that improve both its predictive performance for known defaults and its correlation with observable market forecasts, like Coinbase’s stock price.

- Extends: v3 now covers an additional 15 centralized exchanges beyond the original 30 that were rated by v2.

After publishing v3 alongside this introduction, over the next few weeks we will document the work in a more detailed whitepaper.  Meanwhile, we will continue to monitor v3’s ex-post predictive performance and seek to extend its coverage as we work on refining our data sources.  We will also periodically refit our models, as more defaults become available, probably about once or twice a year. Lastly, we will start building towards v4 with a search for additional explanatory variables not included in v3.  

Appendix: V3 Coverage. New exchanges in purple.

[1] Wikipedia offers a decent introduction to these concepts https://en.wikipedia.org/wiki/Receiver_operating_characteristic

[2] Rehm, F., Rudolf, M.(2000). KMV Credit Risk Modeling. In: Frenkel, M., Hommel, U., Rudolf, M. (eds)Risk Management. Springer, Berlin, Heidelberg.https://doi.org/10.1007/978-3-662-04008-9_8

Subscribe to our
monthly risk briefing

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.