Deep Explorations Series - Part 1

I have recently taken up an interest in revisiting and exploring the "Deep Web" (a.k.a. DarkNet / UnderNet / Hidden Net) - which are just other names for the invisible chunk of the web - the place where no general purpose search engine will willingly take you. But what if you want to be taken there? Most people do after they consider that it composes an estimated 95% of the World Wide Web (just imagine how much unique information is stored there!!). During my exploration, it really struck me as a curious mixture of "the place to be" and "the place to stay away from" - all wrapped in one. Obviously, if you are keen to explore it, neither of those approaches will do - unless you are a fearless hacker (or your local equivalent). Throughout this post, I'll assume you are not one of those mystical creatures.

Infographic

Image by unknown artist

For starters, let's look at what the Deep Web really is. According to Wikipedia, it is "content that is not part of the surface web", where "surface web" is the chunk that is indexed by standard search engines. Considering that only about 0.03% of the pages get indexed (that is 1 in 3000), it becomes evident that for a comprehensive information retrieval, just a surface search may not suffice.

Why are certain pages not indexed? Very good question! It turns out that this occurs for many possible reasons. It is usually due to technical barriers, which may or not have been placed there deliberately by the site's owner. Those barriers prevent the "web crawlers" (a.k.a "web spiders" / "robots", etc.) from accessing the content.

One such barrier is dynamic content - what you find on sites that do not have static pages but generate the content based on a query (scripted pages) or sites that serve content based on the user's identity (password - protected pages). Since search engines can't enter keywords or passwords (or CAPTCHAs), they end up ignoring such content altogether. Another barrier is a site's 'robots.txt' file which gives the site's admin the power to prevent all or certain "robots" from accessing the content by specifying the permissions.

Food for Thought:

Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.
The deep Web contains 7,500 terabytes of information compared to 19 terabytes of information in the surface Web.
The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web.
More than 200,000 deep Web sites presently exist.
Sixty of the largest deep-Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times.
The deep Web is the largest growing category of new information on the Internet.
Deep Web sites tend to be narrower, with deeper content, than conventional surface sites.
Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web.
Deep Web content is highly relevant to every information need, market, and domain.
More than half of the deep Web content resides in topic-specific databases.
A full ninety-five percent of the deep Web is publicly accessible information — not subject to fees or subscriptions.

The Deep Web is a place where they use bitcoins as currency, and where any kind of information can be found - be it revolution plans, crazy scientific experiments, underground fighting tournaments. You can come across the military, the police, kidnappers, scientists, terrorists and much, much more. In the words of a fellow DW explorer, "the party there goes across the entire moral spectrum".

Basically we are talking about information hidden in plain sight - to find it you generally need to know where and how to look. Sometimes, you can use custom, "Deep Search Engines" to find hidden content - like the one you will find at the address ahmia.fi, but those don't index a very big chunk of the pages. At the time of writing, ahmia was indexing exactly 1003 hidden web sites - which, you will agree, is not much at all.

If you have navigated to that address already (it does open in regular browsers), you might have noticed that it is indeed a search engine - with the title "Tor Hidden Service ('Onion') Search". Let's look at what those terms mean.

Tor is one of those recursive acronyms - very popular choice in the tech community - and it stands for "Tor is the onion router".

Onion routing is , according to Wikipedia, a "technique for anonymous communication over a computer network". Data is repeatedly encrypted and sent through several network nodes called onion routers. Like someone peeling an onion, each onion router removes a layer of encryption to uncover routing instructions, and sends the message to the next node, where the process is repeated.

Those are the most recurrent terms in my exploration of the DW, because they are at the very heart of the system. To access a site which uses the Tor network for anonymity, you first need to have all the necessary software on your machine (usually, a Tor browser bundle). You can also use a proxy service like Tor2Web, which can be accessed from regular (not Tor-aware) browsers and search engines (basically giving up all anonymity) - but it is important to keep in mind that much of the traffic was originally anonymized for a reason - trouble might be just around the corner.

A good list with warnings and precautions can be found here.

Just to emphasize some points, it is important not to use any extra plugins in your Tor browser as those can be manipulated into identifying you. Https access is to be used [where possible] instead of http for additional encryption on your connection. Do not use your Tor setup to open your personal accounts such as email, facebook, etc. The Tor network will give you access to invaluable awesome resources that are not available anywhere else but you *must* resist the temptation of opening the downloaded files while online. In fact, it is advisable to do so in a separate, disconnected machine or in a VM (Virtual Machine) such as Virtualbox with networking disabled. Ultimately, in an onion routing network, the more nodes there are, the better the protection. That being so, the more diverse users there are and the more diverse content is being opened by them, the better. It's up to us to encourage other people to safely use Tor, thus making the protection stronger.

Once you have a working install of the Tor browser bundle with all precautions taken, you can start "diving deep" and exploring the things most people do not have access to. It is important to note, however, that like in any anonymous system, anarchy reigns in many corners of the deep web. When we are not bound to our identities, we tend to give in to things we otherwise would have not done. There is a lot of content in the DW that is by all accounts repulsive and degenerate - discretion in exploring such a place is very important. If you know something's "wrong" and it goes against your principles - the wise thing to do, perhaps, is to stay away.

An awesome video resource on the Deep Web can be found here. It is a great idea to head there and absorb the information before your first dive :)

Got Tor? Read and understood the warnings? Watched the video?

Then the two links below should get you started with your Deep Explorations. Cheers!

Copy and paste into Tor browser (NB: NOT to be opened in a general-purpose browser):

http://kpvz7ki2v5agwt35.onion/wiki/index.php/Main_Page <- this will take you to the [in]famous Hidden Wiki. You can find pretty much anything there.

http://ahmia.fi <- Deep Web search engine. One of them.

Tip: look for pages that contain collections of links.

Personally, I go for the tech forums and such - the information contained there is invaluable! Also, be sure to check out the libraries.

If you have any questions or suggestions, you are most welcome to getintouch() or just post a comment below.