Exploration — Analysis of Bitcoin’s Complex Transaction Network Relationship

Donut Protocol
9 min readAug 4, 2021

Complex transaction network relationship analysis mainly includes node centrality analysis, network abnormal transaction structure discovery, and capital flow path discovery.This article will combine complex network analysis to explore the interpretability of illegal transactions in anti-money laundering transactions.

Network node centrality analysis

The social network analysis (SNA) algorithm can be used to measure the information of each node in the network and measure the importance of the node in the current network. It was first applied to the small-world algorithm of social networks to determine the key points in the social network.Knowledge graph anti-money laundering can be used for reference and can be used to mine key information in the graph, such as key accounts, key individuals, and key transactions. Commonly used social network analysis algorithms include PageRank, Betweenness Centrality, Degree Centrality, Eigenvector Centrality , Closeness Centrality and so on.

Centrality features representation

The characteristics of graph node centrality analysis indicate:

Node centrality analysis is to locate the key transactions in the network. It is assumed that the more critical transactions are, the more illegal transactions may exist. However, the node centrality index is not added to the original feature set, and the interpretability analysis of the network is not carried out from the perspective of node centrality.

Center features analysis

Performing graph centrality algorithm on the current anti-laundering network will lead to the centrality feature vector of each node, a total of 10 indicators. They are:

  1. Cen_pagerank:PageRank node importance
  2. Degree:Degree
  3. Outdegree:Outdegree
  4. Indegree:Indegree
  5. Cen_bet:Betweenness Centrality
  6. Cen_in_bet;In-degree betweenness Centrality
  7. Cen_out_bet:Out-degree betweenness Centrality
  8. Cen_eigen:Eigenvector Centrality
  9. Cen_clo:Closeness Centrality
  10. Cen_harmonic:Harmonic Centrality

Calculating the IV RANK of each indicator, it can be found that the central feature vector set has a strong distinguishing effect for the transaction node classification model.

Graph centrality feature vector IV RANK
Figure 1 Out of degree, degree

Figure 1 shows the degree and degree index IV. most bitcoin transactions only have one degree out transaction or one degree in transaction, that is, after bitcoin flows into the illegal transaction node, it flows out to the next transaction. The illegal transaction node is only used as a transit transaction node to carry the role of circulation transaction. Illegal transactions are often hidden in this simple transaction mode, and illegal behaviors also decrease sharply as the number of degrees increase.

Figure 2 Cen_harmonic and Cen_clo

Figure 2 is the IV graph of the index of reconciliation centrality and close centrality. It can be found that they have a good linear trend. The meaning of close centrality is to measure the distance between a node and other nodes in the network under the same connected subgraph, and the reconciliation centrality extends to the case of non-connected subgraphs. If a node in the network wants to transmit its information across the network, it can be used to locate key points in the network. Therefore, it can be inferred that the more critical nodes in the network are, the less likely there are illegal transactions, and key transactions are usually legal.

Central features network analysis

Visualized central characteristic network analysis, observing the transaction pattern of illegal transaction nodes in the network from the original topology structure.

Figure 3 The node with the highest PageRank in the network

The central node in Figure 3 above is the node with the highest PageRank, indicating that this node is the most important node in the current subgraph. It can be found that it has a lot of inputs and outputs, and it may be the most active node in the transfer transaction. Such nodes should be highly focused on. It may have a higher risk of illegality.

Figure 4 The node with the highest betweenness centrality in the network

The node marked in Figure 4 above is the node with the largest number of betweenness in the network. It may be an excessive (bridging) node of many sub-transaction networks. It serves the task of connecting other transaction nodes in the network and may be an intermediary account.

Centrality index aggregation analysis

Centrality index aggregation can be understood as learning the first-order and second-order attribute representations of network nodes. The first-order attribute information is the first-degree relevance index, and the second-order attribute information is the second-degree relevance index. Here we only discuss the first-order and second-order content representation of the central node.

First-order aggregation analysis

First-order aggregation analysis includes first-order out (>) relationship aggregation, first-order in (<) relationship aggregation, and first-order undirected (<>) relationship aggregation. A total of 252 first-order aggregation feature indexes are obtained.

The following table 2 shows the IV RANK of 252 variables (only some variables are intercepted). It can be found that the first-order aggregation feature is better than the node centrality index in terms of distinction of variable. Outdegree can only be ranked 19th.

First-order aggregate feature index IV RANK

Figure 6 above shows the first-order undirected aggregation harmonic centrality summation and harmonic centrality index IV. it can be found that the first-order undirected harmonic centrality summation has a better linear trend than harmonic centrality, so it can be inferred from the transaction mode that the more key transaction node in the network, the lower the probability of illegality of the transaction (illegal transactions need to be low-key). The illegality probability of the surrounding first-order related transaction nodes is also low, as the transaction associated with the legal transaction is usually legal.

Figure 7 First-order out aggregation Pagerank minimum value, Pagerank

Figure 7 above shows the first-order outbound aggregation PageRank minimum index and PageRank index IV. it can be found that they have the opposite trend. The larger the PageRank value, the higher the status of the transaction in the network, but the lower the illegal rate. However, the larger the PageRank minimum value in the associated first-order outbound transaction, the higher the illegal rate.

Second-order aggregation analysis

The second-order aggregation analysis will be more complex than the first-order aggregation, and there are many situations that need to be considered. Figure 6 below includes 9 combination methods, such as first-order out (>) second-order out (>) relationship aggregation, first-order in (<) second-order out (>) relationship aggregation, and a total of 3 * 3 * 10 * 5 = 450 second-order aggregation characteristic indexes are obtained.

Figure 8 Aggregation of second-order centrality indicators

Table 3 shows the second-order aggregation feature IV RANK (only some features are intercepted). It can be found that only some variables in the second-order aggregation feature are better than the centrality variable, but they are all slightly worse than the first-order aggregation feature. It proves that for a transaction with a long link, there will not be too high a possibility of illegality, and most of the money laundering transactions are actually not complicated in transaction mode.

Second-order aggregation feature IV rank
Figure 9 Second-order convergent harmonic centrality summation, first-order convergent harmonic centrality summation

Figure 9 above shows that although the second-order aggregate harmonic central summation has the same trend as the first-order aggregate harmonic centrality summation, the effect is not as good as the latter. Money-launderers want to complete tasks quickly, and do not want a excessive turnover in the middle of transactions.

Table 4 Centrality aggregation index model

Note: where C is the node centrality index, C1 is the first-order aggregation index, C2 is the second-order aggregation index, AF is the original feature set, and RF is the random forest (n_ estimators=50,max_ Features = 100), the evaluation index in the figure is the evaluation of illegal transactions.

As can be seen from table 4 above, the model RF (c + C1 + C2 + AF) integrating central aggregation indicators has improved the accuracy by 1%, the recall rate by 5% and F1 by 4% compared with the original feature set RF (AF) model.

Network transaction structure discovery

The discovery of abnormal network transaction structure is mainly to find some patterned abnormal capital structure from the network, for example:

  1. Frequent import/export
  2. Chain transaction structure
  3. Centralized transfer in/out
  4. Decentralized transfer in/out
  5. Circular trading structure
Figure 10 Illegal chain transaction structure

Figure 10 above shows the illegal chain transaction structure in the network. It can be found that there is a one-way transfer-out relationship between nodes, in which the starting node and final node of this transaction can be located, so as to trace the complete transaction chain.

Figure 11 Illegal centralized transfer to legal

In the figure 11 is illegal centralized transfer transactions. It can be found that the transfer into the trading center is a legal transaction. This transaction is obviously a planned transfer, which is also mixed with legal transfer transactions. Through this transaction, the purpose of “washing money” is finally achieved.

Figure 12 Illegal centralized transfer

Figure 12 above shows the illegal transfer-out transaction. A transaction center node responsible for money laundering often plays an transmitting role. Most of the transferred money will be transferred out through various means. Therefore, by calculating the discount rate of the transaction node (transfer-out amount/transfer-in amount), in coordination with the centralized transfer-out mode, illegal intermediary centers can be identified.

Analysis of capital flow path

It is mainly used to study the connection between transaction nodes in the transaction network, and use the shortest path algorithm to find the most direct intermediary between individuals. The shortest path between transactions can directly locate illegal transactions. Based on the current path analysis algorithm, the following scheme is proposed.

Figure 13 Analysis of Capital Flow Path
Figure 14 Shortest path

Figure 14 above can be based on the shortest path analysis of the two illegal transaction nodes, and other illegal transaction nodes on the path can be easily found.

Legal transactions often follow the principle of minimum cost and shortest time, and are unlikely to have a long capital path. Therefore, according to the path analysis, we can find some important transaction paths.

conclusion

This task is mainly to conduct complex network analysis of the anti-laundering transaction network, and explore new solutions to the anti-laundering task from the perspective of node centrality, N-order aggregation, network structure discovery, and capital flow path.

The conclusions are as follows:

1. The performance of the first-order aggregation centrality index is better than that of the node centrality index, which proves that the more the first-order correlation attribute information around the node is obtained, the better the enhancement effect of the model is.

2. Compared with the original feature set RF (AF) model, the RF (c + C1 + C2 + AF) model with central aggregation index has an improvement of 1% in accuracy, 5% in recall and 4% in F1. Where C is the node centrality feature set, C1 is the first-order aggregation feature set, C2 is the second-order aggregation feature set, and AF is the original feature set. From the perspective of model, the central aggregation index can improve the effect, and from the perspective of visualization, the central feature index can quickly find the key nodes.

3. The discovery of network transaction structure can find the illegal transaction mode in the network, and can do backtracking of cases.

4. The capital flow path can find more illegal transaction nodes and provide more explicability for the illegal transaction mode.

--

--