The dark web is a part of the Internet that is hidden from ordinary users and is only accessible through special anonymizing programs such as Tor. Various illegal activities take place on the dark web, such as the sale of illegal substances, weapons, false documents, and hacking services.
South Korean scientists have created an artificial intelligence that can analyze and extract useful information from texts on the dark web. Their AI is called DarkBERT and is based on the RoBERTa architecture, one of the most powerful natural language processing approaches developed in 2019.
To train the model, the scientists collected a large database of dark web texts, scanned it through the Tor network, and then filtered out repetitive and off-topic information. They then used this database to train RoBERTa LLM, a model that can process dark web texts and extract key elements from them.
Scientists have shown that DarkBERT outperforms other large language models in the quality of dark web text analysis. This can help cybersecurity professionals and law enforcement go deeper into the corners of the internet where criminals lurk. However, DarkBERT has not yet been perfected and requires further training and tuning. How exactly it will be used and what knowledge it can provide is still unknown.