By Johnmichael O’Hare
Online sources open a new world of information that can help detectives find threat actors, speed up investigations and protect lives. But there’s a catch: As part of any successful investigation, police departments must effectively collect, ingest and analyze vast amounts of data. Fortunately, a data management strategy and supporting technologies provide a way to tame the data explosion.
These are the top online challenges law enforcement investigators face and how they work around them:
Information management was arduous enough when investigators relied mostly on paper documents housed in filing cabinets. Online sources have significantly increased the amount of data potentially available to aid in an investigation. One terabyte of data, hardly a remarkable amount by today’s standards, can contain more than 80 million document pages. The paper equivalent would take up an absurd number of filing cabinets.
That’s the essence of the so-called big data problem: sifting through an over- abundance of data to identify the pivotal, high-value information. And it’s not just the volume of data that’s the issue. In addition to structured data such as database records, organizations must also cope with unstructured data such as digital photos, video, social asset data and the aforementioned documents. In addition, there’s the gray area of semi-structured datasets that lack the structural characteristics of a database record but include elements such as tagging that define a document.
Police detectives need both a systematic management approach and technology to process big data. On the management side, they also require compliancy policies that govern online investigations, keeping in mind the salient regulations and legal principles.
The technical underpinnings of online investigations include a repository for storing large volumes of data and specialized tools for searching the data store and analyzing the information. Natural language processing (NLP), a type of artificial intelligence (AI), is critical here. NLP converts human language, whether text or spoken word, into a format a computer can process.
With NLP, investigators can query the database using code words, jargon, hashtags and keywords associated with threat actors, groups or activities – local terminology for narcotics sold within a jurisdiction, for example. This AI-based approach lets organizations sift through terabytes of data to find the information they need to advance an investigation.
When threat actors conduct business online, they enjoy a certain level of anonymity. The surface web – the commonly used layer of social assets and indexed websites – provides the ability to use “handles” or create fake accounts to mask identity. This basic anonymity intensifies in the web’s deep and dark layers, which are not indexed via conventional search engines. The dark web, in particular, enables sophisticated threat actors to conceal themselves using anonymizing routers and proxy servers, for example. Threat actors in the dark web may traffic stolen credit card data, sell illicit drugs or cultivate extremism. The scale of the various web layers – featuring some 1 billion-plus websites – coupled with various cloaking approaches complicate online investigations. Where do detectives start their investigation to positively identify nefarious individuals?
Threat actors, however, leave digital footprints as they traverse the web’s layers: a dark website linked to a social asset on the surface web or an encryption key assigned to a regular email account, for instance. Deanonymization boils down to connecting the dots among bits of information gleaned from the surface web and the online world’s subterranean tiers. But to succeed in identifying threat actors, investigators will need a knowledge of the dark web, relevant web intelligence (WEBINT) techniques and a specialized browser that can access hidden sites, forums and marketplaces. As with combing through big data, investigators should consider enlisting AI to help piece together identities. AI and the related field of machine learning can help law enforcement agencies correlate the bits of information that surface during an investigation, assisting with deanonymization. Investigative experience and intuition remain paramount, but those AI technology can extend those human capabilities.
Timeliness and the reliability of information are top concerns in any investigation. Online inquiries introduce some considerations specific to electronic data gathering and analysis, chief of which is volume. Finding actionable, accurate intelligence is a sea of data takes time and effort. Resource-strapped organizations relying on manual searches and data sleuthing will probably find the job simply takes too long when time is critical. Agency leadership will soon drop support for online investigations that span days and appear to produce nothing. Afterall, taking too long to find a threat actor could give them time to disappear or result in more crimes being committed.
WEBINT combined with AI, however, can automate searches, dramatically accelerating online investigations while also improving accuracy. This intelligent automation should, ideally, span the surface, deep and dark web layers. It should also enable investigators to create custom search parameters, which could include details such as a threat group’s hashtag, terminology and location data (names of countries, cities, streets, etc.). The ability to program a holistic search and turn it loose across the web saves time. But investigative organizations also need the power of intelligent automation to rapidly correlate data. Relying solely on detectives to find the connections among seemingly disparate pieces of data will add hours, if not days, to an investigation. Correlation also helps unmask threat actors, as noted, leading investigators toward the data they need to maintain as evidence.
Faster investigation means faster interdiction. When searchers uncover threat actors’ plans – whether an organized retail theft or extremist action – agencies can protect property and, potentially, save lives. Automation also provides business value and return on investment for police departments. Greater investigative efficiencies will result in investigators spending fewer hours on inquires, resulting in a proportional cost savings. This ultimately helps create a positive image for police departments. As more criminal cases are solved, an enhanced and positive perception will be created that crime is being managed. Agencies can achieve a virtuous circle.
Getting a jump on threat actors’ plans depends on obtaining reliable threat intelligence. The threat intelligence domain aims to proactively acquire information on emerging dangers or crimes that are being planned so police officers can institute preventative strategies and tactics. The practice is often associated with financial institutions fending off cyberattacks, but it also applies to law enforcement agencies.
Indeed, law enforcement departments with poor threat intelligence can be caught off guard. A detective investigating the shipment of illegal firearms could miss ‘online chatter’ between buyers and sellers planning to traffic the weapons and receive payment for the shipment. The problem often stems for a lack of investigative tools or approaches that are limited in scope. For example, police investigators may maintain good intelligence on individuals who sell illegal firearms at the ‘street-level,’ but lack the ability to probe what is happening in the online world where planning, pricing, logistics and trading is discussed. Similarly, an agency using tools limited to the surface web will find it difficult to anticipate extremist groups that plan their actions on the dark web. It both cases threat intelligence is poor or non-existent.
Automated WEBINT facilitates threat intelligence. The ability to quickly aggregate and search social asset data, for example, can help expose relationships among threat actors and gain insight into their plans. Dark web search capabilities offer an additional window into activities in the works. Overall, data-driven threat intelligence can help agencies snuff out problems before they materialize, which obvious benefits for lives and property. Agencies can also make strides toward greater investigative efficiency. Threat intelligence can help investigators prioritize threats and focus their energies on the most pressing risks. That way, agencies can deploy human intelligence (HUMINT) to its greatest effect. That’s a huge plus for agencies facing staffing constraints.
The end game of an investigation is preserving and presenting evidence that leads to indictments and convictions. Converting data gathered online to evidence requires due diligence to make sure agencies have properly identified the threat actor and the online platform used to make a threat. Investigators will send preservation letters to the relevant platforms so the information is maintained and safeguarded. The subpoena process then follows.
Detectives often find the conversion process challenging. The issues range from learning how to request data from a hyperscale web platform to documenting online investigative methods. But WEBINT and intelligent automation can support this task. AI’s precision and ability to construct finely tuned searches builds confidence in the trustworthiness of the data, expediting due diligence.
Automation also comes into play at the end of the subpoena process, which can result in a data dump of staggering proportions. Investigative agencies should have an automated system on hand for ingesting and processing large data sets. Without such a mechanism, an agencies online investigation can go to waste.
Online investigations require a comprehensive, automated method
An online investigation can tap a rich store of information previously unavailable to law enforcement agencies. But that potential will remain unrealized without a comprehensive strategy and appropriate technology for gathering and analyzing massive amounts of data. The melding of WEBINT, automation and time-tested HUMINT speeds up investigations and increases confidence in the data generated. AI structures precise queries that provide a wide-angle view of threat actors and increases confidence in data quality. And data correlation pieces together informational breadcrumbs to reveal threat actor identities.
Taken together, those techniques and tools help agencies overcome the dual challenge of big data and short deadlines.
Johnmichael O’Hare is the sales and business development director of Cobwebs Technologies (www.cobwebs.com). He is the former Commander of the Vice, Intelligence, and Narcotics Division for the Hartford (Connecticut) Police Department. Prior to that, he was the Project Developer for the City of Hartford’s Capital City Command Center (C4), a Real-Time Crime Center (RTCC) that reaches throughout Hartford County and beyond. C4 provided real-time and investigative support for local, state, and federal law enforcement partners utilizing multiple layers of forensic tools, coupled with data resources, and real-time intelligence. Contact him on [email protected]