An Empirical Analysis of Traceability in the Monero Blockchain

Abstract

This paper empirically evaluate two weakness in Monero’s mixin sampling strategy.

About 62% of transaction inputs with one or more mixins are vulnerable to cascade effect so that the real input can be duduced by elimination
Monero mixins are sampled in a way that mixins can be easily distinguished form the real input by their age distribution. Since the real input is always the newest input.

The author use the 2th weakness to guess real input with a 80% accuracy over all transactions with one or more mixins.

Besides, the author study the importance of mining pools and the former anonymous marketplace AlphaBay on the transaction volume. After removing mining pool activity, there remains a largte amount of potentially privacy-sensitive transactions that are affected by these weakness.

To improve the anonymity of Monero, the author gives two contermeasures.

A new mixins sampling method with two-parameter model (gama distribution)is proposed which can well approximate the user’s ‘spend-time’ distribution.
Sampling mixins in ‘bins’, this method regardless of sampling distribution.

Deducible Monero Transactions

A significant number of Monero transactions do not contain any mixins at all, but instead explicityly identify the real TXO being spent. Since users are allowed to create zero-mixins transactions at the begining of Monero. These zero-mixins transaction present a hazard of deanonymization of transactions include them as mixins.

At the time of April 15, 2017, a total of 12158814 transaction inputs have zero mixins.Figure 5 presents the fraction of transactions containing zero-mixin inputs over time.
在这里插入图片描述

implemention

The author extract Monero blackchain up to block 128774(April 15, 2017) and
stored it in a Neo4j graph database (11.5GB of data in total). the algorithm is the same as cascade effect(Amrit Kumar, Clément Fischer, Shruti Tople, and Prateek Saxena. A traceability analysis of Monero’s blockchain. In Simon N. Foley, Dieter Gollmann, and Einar Snekkenes, editors, Computer Security – ESORICS 2017: 22nd European Symposium on Research in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II, pages 153–173. Springer International Publishing, 2017.)

Results on Deducible Transactions

Table 2 presents the results, totaly there are about 63% Monero transaction inputs with more than one mixins were deduced.

在这里插入图片描述
Figure 6 show the amount of vulnerable Monero transactions with the number of mixins, and it also shows transaction is less likely to be deducible with more mixins.

Tracing With Temporal Analysis

Effective-Untracebility

To quantify the untraceability of a transaction input, the authors used guessing entropy to represent the expected number of gusses before guessing the real spent among a inputs. The transaction input’s guessing entropy is defined as
$\mathrm{Ge}=\sum_{0 \leq i \leq M} i \cdot p_{i}$
where Ge is guessing entropy, $p = p_0, p_1, . . . , p_M$ are probabilities, sorted highest to lowest, that a referenced output is the real spend of a transaction input.

The authors define effective-untraceability as $1+2Ge$ . if all referenced outputs of a transaction input are equally likely to be the real sepend, the effective-untraceabilty for that inputs is M+1.

The Guess-Newest Heuristic

The author proposed a heustric that among all the prior outpus referenced by a Monero transaction input, the real spend is usually the newest one. That point view came from the Figure 2.
在这里插入图片描述
The conclusion came from zero-mixin inputs and inputs deduced from zero-mixin inputs. The mixin’s spending time is quite different with the real input.This is because users spend coins soon after receiving them while the mixin’s sampling method do not take account of user’s spending behaviours.

The authors found the 92% of the deducible inputs coule be gussed correctly in that way. The result is in Table 3.
在这里插入图片描述

Countermeasures

Improve the mixin-sampling procedure to match the real spend-time of MOnero users.
introduce a countermeasure called binned mixin sampling which modifies the current mixin sampling procedure.

Estimating the spend-time distribution

Following Figure shows the CDF of Bitcoin blockchain and Monero blockchain.
在这里插入图片描述
The Bitcoin spend-time have a somewhat similar shape with Monero spend-time.The authors use R’s fitdistr function to fit a gama distribution(shape parameter19.28, rate parameter 1.61)

Sampling mixins using the spend-time distribution

using the distribution above to sample mixins to matches the ideal spend-time. The author’s method is:

1. sample a target timestamp directly from the distribution
2. Find the nearest block containing at least one RingCT output.
3. sample uniformly among the transaction outputs in that block

A more detail Algorithm is following
在这里插入图片描述

Monte Carlo Simulation

Following Figure shows the effective untraceability set under the current regime and the author’s proposed mixin sampling routine, the effective-untraceability set has significant increased.method performs slightly worse than ideal at 6 and 12 months out, although still much better than the current method, and stays within 75% of the ideal
在这里插入图片描述

Preserve some untraceability even in the face of a highly compromised mixin sampling distribution.

Binned mixin sampling

Group outputs in the Monero blockchain into sets of some fixed size called bins such that ecah output in a bin is confirmed in the same block or a neighboring block as Figure 13.

在这里插入图片描述
Any transaction input referencing a transaction output in a bin, either as a mixin or spend, must also reference all other outputs in that bin. Thus, a real spend cannot be distinguished by age from the other mixin outputs in the bin. Additionally, binned mixin sampling ensures that all the outputs in a bin cannot be deduced as spent until the last unspent output in the bin is spent, preventing deduction attacks from reducing the effectiveuntraceability of an output to less than the bin size.

Sample Algorithm is showed in Algorithm 2.
在这里插入图片描述

Recommendations

The mixing sampling distribution should be modified to closer match the real distribution.
Avoid including publicly deanonymized transaction outputs as mixins
Monero users should be warned that their prior transactions are likely vulnerable to tracing analysis
The aothor launched a block explorer(https://monerolink.com), which displays the linkages between transactions infered using our techniques, they recommend additionally developing a wallet tool that users can run locally to determine wheter their previous transactions are vulnerable.

Thinking

There is some points in this paper:

Since zero-mixins result in a short transaction and spend less fees, so the user choose to choose zero-mixins, but the author ignores if it is because there are not enough same denominations? But the paper [A Traceability Analysis of Monero’s Blockchain] (https://blog.csdn.net/t46414704152abc/article/details/89175204) presents a rigorious analysis.
The author said new transactions are immune is not because of the RingCT mechanism itself, rater because RingCTs was deployed after the mandatory 2-mixin was enforced. I came up with a doublt, why? but the author didn’t tell us the analysis support.
May be next time, we should analyse the untraceability of RingCTs.