By all accounts, the public internet is a big place. According to some estimates, the public internet now hosts over 1 billion websites. Experts predict that internet users will send or receive in 2019 over 2 zettabytes of data. In common terms, that’s 2 billion terabytes (see figure below). If you can’t see the circle for the amount of data stored in a typical home computer, you are not alone. Even if you zoomed on this page at 10,000%, the dot on the right would still only be 5 nanometers in diameter.
Figure 1: comparison of yearly public internet traffic and the data stored in a typical home computer
As big as the internet is, the private internet – or deep web – is still much bigger. The deep web consists of all the information behind a login wall. There is a good reason for the existence of a private internet. We would not want all our financial, health and even voting records publicly accessible for anyone to see. Recent estimates suggest that the deep web may be as much as 25 times larger than the public internet. The largest website on the private internet is unsurprisingly Facebook.
The aim of this blog post is to explore an even more private section of the internet, the Tor hidden services. The U.S. Navy developed the Tor network with three aims in mind. The first was to protect the integrity of its communications between its bases in the U.S. and its personnel traveling abroad. To do so, asymmetric encryption keys encrypt all communications. These keys are very difficult if not impossible to crack unless you have one of those fancy quantum computers we keep hearing about.
Figure 2: Fancy quantum computer1
The second aim of the Tor network was to hide the destination of the communications of the U.S. Navy’s remote personnel. An agent in Iran for example would not want their communications identified as going to a U.S. Navy base. To achieve this, the Tor network uses a series of relays which make it very difficult to follow the packets as they move through the internet. Unless once again you have one of those fancy quantum computer (see Figure 2).
Now, if only the U.S. Navy personnel uses the Tor network, all this encryption would be rather useless. That is why the third and last aim of the Tor network was to open the network to the public. This generates noise in which the U.S. Navy’s traffic can hide. Tor hidden services are websites hosted on the Tor network. They benefit from all the encryption that the Tor network provides. It is therefore very difficult to identify the physical server that is hosting their content and to profile their visitors. According to the Tor Project, the NGO now managing the Tor network, there are anywhere between 50,000 and 120,000 hidden services active on a daily basis.
Many studies (here2, here3 and here4) have analyzed the content found on Tor hidden services. They discovered that much of it was related to online illicit markets, drugs and child pornography. Financial fraud, hacking and identity fraud were also popular occurrences on Tor hidden services. Figure 3 presents a summary of Spitters et al.’s findings in 2014. Web crawlers which indexed the content of the Tor hidden services in a similar fashion to what Google indexing robots do generated these results. The crawlers visit the homepage of a Tor hidden service, looking for hyperlinks to other webpages on the same website or on different websites. The crawlers follow each link until they have exhausted their queue.
Figure 3: Content found on Tor hidden services
Flare Systems Inc. developed its own web crawling infrastructure which prides itself for its agility and efficiency. Indeed, each task of the web crawler is distributed among a number of nodes. The nodes work in parallel to download webpages, authenticate on websites which require that visitors supply a username and password and index the content of the webpages. Through them, Flare Systems Inc. builds its own map of the darknet where each dot is a Tor hidden service, and each line is a hyperlink between Tor hidden services. You are welcome to try and count the Tor hidden service if you are bored at work, but we’ll give you the answer to how many there are below Figure 4.
Figure 4: Map of the darknet
There are many ways to determine how important each Tor hidden service is. Past studies have sought to measure their internet traffic. In this blog post, we’ll instead use social network analysis to build a list of the Top 5 Tor hidden services based on three metrics. The first is called centrality degree. It is a count of the number of incoming hyperlinks from other websites. The more mentions and links a Tor hidden service receives, the more likely it is to be important in a network. The second metric is proximity centrality degree. It calculates the number of hops that a visitor would need to go from one Tor hidden service to any other Tor hidden service. It is in a sense a measure of the diameter of the darknet. The third metric is the page rank. It is based on Google’s algorithm of identifying the most relevant websites. This relevance is measured both by the number of incoming hyperlinks as well as the notoriety of the Tor hidden services pointing those hyperlinks to another Tor hidden service. Figure 1 presents the Top 5 Tor hidden services for each metric.
Three main take-aways are to be gathered from this table. First, using social network analysis, we find that the importance of legitimate websites on the darknet is much higher than previously thought. Indeed, chat services, web hosting services and directories are not per se illicit though they may be used for nefarious (or some serious time wasting5) reasons. The darknet is therefore not necessarily all about threats and crime, depending on how you rate the relevance of Tor hidden services.
The second take-away is that the illicit activities we have discovered appears to be centered around illicit markets either general (ex. DreamMarket) or specific (ex. Carding). These have been cited as some of the most prevalent content on the darknet and our study finds a similar result.
Finally, Tor hidden services appear to be weakly connected to each other as our indexing only managed to collect data on a small subset of the Tor hidden services that are available. Many of those Tor hidden services are likely single-page tests or empty websites but the difficulty in reaching these Tor hidden services strikes a difference with the internet where connectivity between websites is much higher. Further analysis shows that as much as 10 hops are necessary to move from one Tor hidden service to another that is the furthest away; the public internet can be crossed in 19 hops, so almost twice as much, but with 10,000 times more websites.
In summary, this study finds that the darknet is more diverse that originally believed and that the metric one uses to describe it is important to evaluate the results found.
 Source: https://www.techworm.net/wp-content/uploads/2017/09/maxresdefault-1.jpg