Embracing the Zeitgeist, in 2017 The Economist declared that Data should be considered as one of the most valuable of resources, more than oil and gas. The value of data analytics was beginning to be understood then, and it has since become the go-to tool for making informed decisions in various industries such as finance, healthcare, and marketing. The defence and security sector is no exception. Data is a vital weapon system in modern warfare, and the ability to leverage Big Data analytics can provide unparalleled decision-making advantages to the military. Key is the importance of context and timeliness, and this cannot be overstated.
Big Data vs Data
Big Data arrived with much fan-fair around ten years ago and has been redefined a couple of times since as the technology has matured. For the layman, Big Data means extremely large data sets that may be analysed computationally to reveal patterns, trends, and relationships. This means looking not only at your monthly bank statement but looking at everyone’s in the country.
Analytics: Artificial Intelligence
Historically, analytics involved intelligence officers pouring over images, radio intercepts and human intelligence, in an attempt to get a sense of strategic and perhaps tactical activity. When you introduce Big Data, for example an Intelligence Surveillance and Reconnaissance Drone’s footage, you can use that live data only within the parameters of the mission, but to extract maximum value from that ISR flight, the military will want to pull from the wide area imaging technology, all the points of interest and movement etc around the mission. In a test case, the US DOD captured four million points of interest and 200,000 tracks from a 15 minute flight. The time taken to manually process this Big Data dataset by a team was two months by which time, the actionable information was out of date. By using AI (and the disciplines of Machine Learning and Natural Language Processing included therein) this processing of this Data could be handled in near real time.
Context is another crucial factor that makes data meaningful. At Quantexa Ltd, a 7 year old UK Unicorn which includes ex-MI5 chief Lord Jonathan Evans on its advisory board – we would explain context thus, “you wouldn’t buy a house after peering through the letter box, you would review the sales history, school options, council taxes and a raft of other information”. Quantexa was established initially to search for money launderers and financiers of terrorism. It applies context by not just looking at an individual, but at the individual’s financial relationships, patterns of behaviour, social media, data about their address and so forth. Consider this simple example: an address with four people with the same surname is a family unit, an address with four people with different surnames is a flat-share, and an address with four different surnames sharing one mobile number is a fraud-operation. That information is gleaned by looking at council tax data and bank data, there’s no need to even bring in the mobile phone companies!
Without context, data is just a collection of numbers and information that is difficult to interpret and make sense of. Contextualized data can be used to provide actionable intelligence that can help prevent terrorist attacks. Context comes from data, and lots of it, it highlights anomalies, patterns and relationships. Consider the case of a domestic terrorist attacker; in a single bank account there is enough data to identify several strong indicators of a potential terrorist. A recent credit of a loan from a high cost short term credit provider (e.g. pay day loan), the distribution of all available funds to family members, repeated payments demonstrating a presence in an unusual location, e.g. spending all day at the kebab and burger joint 300m from the entrance to, say, an army barracks.
Research shows that whilst traditional criminals are spontaneous, terrorists go to great lengths in planning their attacks — and often commit other crimes or demonstrate other behaviours while doing so. There are a number of filters that have been used in the past, fighting-age-males being an obvious one, but filters alone cannot accurately produce actionable intelligence without a huge number of false-positives. These false-positives are what you get when you know your terrorist is called John Smith but there are 30,000 John Smiths in the UK meaning (hopefully) 29,999 are the wrong John Smith.
Security forces should use Big Data analytics to process data from a wide range of sources, including phone data, social media, dark web, web search history, media, and of course foreign and domestic classified information. Add to this data from bank accounts, credit cards, corporate registers, geodemography and even, I suppose, library book borrowings. This data is then analysed using advanced and constantly evolving data mining and machine learning techniques to identify patterns and trends that could indicate potential threats to security.
Another critical factor in avoiding false-positives is Entity Resolution (ER). How many times have you been frustrated that your bank doesn’t know who you are? Half of this comes down to unlinked systems. The retail bank simply doesn’t know you have a mortgage with them except for the outgoing payments, whilst they have no clue that your partner also banks with them, or that your parents have been loyal customers for 50 years. The failures are down to data management. Now leaving aside for a minute that data is held on different systems, it is recorded to different standards – Y2K, the fixing of DD/MM/YY to DD/MM/YYYY is an example. Another would be different data collection standards. Who knew they needed to ask about the bank account holders’ sex before 1918, let alone record it? When your parents bank accounts were computerised in the 1960’s, were their date-of-births recorded correctly, or were they recorded like 100,000’s of others as 01/01/60?
Consider the example of John Smith, a common enough problem is working out if Jonathan Smith who bought the ammonium nitrate is the same JSmith who bought the pressure cooker who is the same Jonny Smith who searched online “how to make a b*o*m*b”. Assuming that this data came from multiple sources, traditional filtering would never find the right individual but if you have enough data, you can eliminate the false-positives arriving at a true-positive – to paraphrase Sherlock Holmes, once you have eliminated the false-positives, only the true positives can remain. A sub-feature of Big Data, ER uses a combination of different source’s data to corroborate (or otherwise) individuals and corporates. It is vitally important to do Entity Resolution when working with Big Data as ER will tell you that the JSmith living at 123 The High Street is the same entity as Jonny Smith who took a pay day loan last week.
The pièce de résistance of Big Data is Network Analysis (NA). The ability to find hidden relationships – the cornerstone of every detective story plot. NA has traditionally been an example of mobile phone owner + phone use = owner’s location. Useful for a short while in the Middle East and then a brief and surprising uptick in usefulness in 2022 in eastern Ukraine before dropping off in popularity again. NA with Big Data is more than that, it is linking multiple sources of data to allow a likely conclusion. An abstract example of this was when we found that anyone with a credit card from a certain provider, that bought advertising space on a certain adult website, that also paid for a single female on a one-way low-cost-carrier from central-Europe was either people-trafficking, or worse. This is not the end of it, because NA investigations allowed the investigators to see who they received and paid money to, properties rented, landlords, bosses (in the gangster meaning of the word), businesses where the money was laundered and so on, and of course the customers/paedophiles. Thus, one pattern of payments unveiled an entire network of collaborators.
Artificial intelligence can be used to analyse data quickly and accurately, providing insights that can be used to prevent terrorist attacks and protect national security interests. Consider the subset of AI – Machine Learning (ML) which looks at the outcomes of many decisions and attempts to replicate those decisions immediately. This allows huge amounts of data to be reviewed in real time. An easy example is Sanctions. Say a German manufacturer exports 5,000 washing machines each month to Belarus (a country bordering Russia), but after new Russian sanctions in February 2022, the orders increase from 5,000 to 105,000 then it would be easy for ML to spot a spike in the orders. Whilst this example screams of “but surely someone would notice” it is a fact of life that its nobody's job to notice. The bank providing the trade-financing might, but normally it sees only a percentage of the volume of any industry as firms all spread their risk over multiple banks. So it’s not the bank. The manufacturer more than likely sells through a number of wholesalers (NEFF, Bosch and Siemens are all made in the same factory after all) so unless the unregulated manufacturer is looking to spot at why they are so fortunate that their entire dropped Russian orderbook has been picked up by Belarus, there’s no way to spot this with casual surveillance, without AI/ML.
While Big Data analytics has the potential to provide significant benefits to defence and security operations, it also presents several challenges. One of the biggest challenges is the sheer volume of data that must be processed and analyzed; it requires the cloud, it requires master-data-management, and it requires expensive and rare data-scientists. A common approach to mitigate these challenges is collaboration with industry. The gold-standard could be seen as Palantir and the UK’s Quantexa. When you have the data, collaboration is essential when it comes to Big Data analytics in defence and security operations. The sharing of data between different agencies and departments. By sharing data, the military and civil powers can gain a more complete picture of potential threats and better protect national security interests. To that end, law makers have a complicated responsibility to the people to both protect their privacy whilst protecting them from harm and UK-GDPR (the Data Protection Act of 2018) certainly should be reviewed.
In conclusion, Data is a valuable resource and here to stay. To exploit this, a legal framework should be created to allow security forces to successfully leverage Big Data analytics to gain decision-making advantages in warfare. Artificial intelligence is necessary to process and extract value from Big Data and context is crucial in making data meaningful and actionable. Entity Resolution is important in working with Big Data to identify and corroborate individuals and entities, and should be done before attempting Network Analysis to unveil hidden relationships. In order to do this, investment and training of advanced tools and technologies will play an increasingly important role in producing real-time or near-real-time intelligence.