Recently (February 2021) I was invited to come and chat a bit about my work on the cusp of law and computer science at Cambridge University’s Pembroke College Stokes Society (a student run fellowship for STEM research, scooping up some STEAM interventions). A brief reflection on my current research reminded me that my focus on computational law and ‘legal tech’ brings together two disciplines, practices and things: AI and law, as research domains, development practices and as operating systems. AI systems and systems of legal norms both have a performative effect when applied in the real world (the one we share, inhabit and navigate). AI has such effect as its computer code enables some things while disallowing others, law as it gives specified legal effect to some of our actions while not to others. Both AI and law shape our shared world and, in the process, they shape us – opening new inroads while closing off others.

When planes incorporate software systems (that may increasingly fall within the scope of AI) we hope they won’t fall from the sky in the course of taking decisions that were meant to save us from such calamities. When legal search engines are used to locate relevant case law, we hope they won’t mislead attorneys about the arguments they might make to support a client’s case, or – if it is the court that employs such software – we hope the court will not be led astray by machine learning algorithms that train on historical data attuned to past circumstances, incapable of using discretion to unfollow rules that violate the integrity of the law. In both cases we need robust systems that save us from the consequences of disruptive turbulence, capable of responding to unexpected perturbations that often require some internal reconfiguration to preserve both functionality and resilience.

In this blog (part I) I discuss what is meant with robust AI (a term of the trade), in the next blog (part II) I will develop the notion of robust law and in the final blog (part III) I will address the confrontation with data-driven legal search that is mediated by machine learning systems, arguing that such ‘legal search’ may either boost or break the robustness of law and the rule of law.

What is robust AI?

Besides being the name of a company on the forefront of AI research (with renowned roboticist Rodney Brooks and neuroscientist Gary Marcus in the lead), robust AI usually refers to computing systems that have been verified mathematically and validated by way of empirical testing, to ensure safety and functionality. The concept of ‘robust’ in AI research is used as the opposite of brittle, where a brittle system is one that may digress from its intended behaviour due to perturbations (or noise). Basically, a brittle system cannot deal with changing circumstances, making deployment hazardous, especially if the resulting unexpected behaviour can result in physical or other harm. One example of a machine learning system’s sensitivity to minor perturbations is the way image recognition systems respond to ‘adversarial machine learning’, where changing just a few pixels results in misclassification. A robust AI system should have ways of side-lining unwarranted responses to unanticipated triggers in its environment or be capable of developing on-the-spot alternative responses that allow it to adequately cope with a new situation. This becomes crucial when AI systems are meant to respond in real time, for instance preventing collision (self-driving cars) or while caring for an elderly person who is more fragile than anticipated. One can imagine that an Internet of Things would connect many systems and devices, via cloud-, edge- or fog-computing, requiring intuitive real time adaptation and vigilance against a weakest chain being less than robust, as it could lead to a cascading failure mode of an entire environment or critical infrastructure.


Robust AI is usually associated with prior and iterant verification, meaning that a reliable model of the system is developed and mathematically proven to behave as claimed. This raises questions about the model, that is the formal specifications of the system, suggesting that it might help to develop different models to test alternative interpretations of the system. As the behaviour of a system depends on the affordances of its environment, verification must include environment modelling, and as AI systems often employ machine learning the inherent dynamics of the systems must be modelled against the background of relevant environments (use cases). This in turn raises questions about the extent to which verification would scale and about the extent to which verification can be built into the design of AI systems.


Mathematical proof, however, always depends on modelling of the system and should not be confused with empirical testing of the behaviour of the system in different circumstances. This is called validation of the system. These circumstances can be constructed in a laboratory or framed by mining data in real world environments. In both cases these constructions and framings raise myriad questions about the validity, significance and meaning of the results of such ‘testing’, for instance because testing against data is not the same as testing against empirical findings. Those involved in validation should beware of computational empiricism, replacing but in many ways asserting logical empiricism. Computational empiricism goes wrong where it assumes that data is equivalent with given facts, whereas in point of fact data is the trace, representation or simulations of whatever has been datafied (whether by way of labelling, sensors or inferencing); it is not reality itself, which remains elusive and thereby datafiable in different ways. In other words: ‘raw data is an oxymoron’.

The other constraints: robust and responsible AI

Robustness is not only relevant in relation to physical harm; it also relates to behavioural targeting in the domains of, for instance, social security fraud, unlawful immigration or medical diagnosis. Robust AI is not only about planes not falling from the sky but also about decisions that fail to treat individual citizens with equal respect and concern due to either arbitrary (incomprehensible) decision making or unfair systemic bias.

In the case of automated decision-making systems, it is less obvious whether decisions were incorrect or unfair (noting that being unfair may imply being incorrect). A subdiscipline of fair machine learning is flourishing on the cusp of computer science and the humanities (notably law and ethics), raising hard questions about what ‘fairness’ means in different contexts and to what extent values such as fairness are computable. Arvind Narayanan detected 21 different computational definitions of fairness, which confirms that it is an essentially contested concept – also after being disambiguated, formalised and computed. Probably such contestability is more apparent when one is faced with so many different computational versions of the same concept, demonstrating the rich and diverse background of the concept and why choosing one version over another matters. Perhaps a computing system is robust if potentially unfair outcomes have been taken into account during development, based on effective interaction with those who will suffer the consequences of its behaviour, while foreseeing proper procedures for contestation once the system is deployed. Though this will require both disambiguation and formalisation – and thereby a choice for one or more computational definitions of fairness – the agonistic debate involving those concerned should bring these choices to the table and make them contestable. The nature of these choices already clearly indicates the interrelationship between politics, ethics, code and law, as discussed in the final chapter of my Law for Computer Scientists and Other Folk.

In a similar vein the subdomain of explainable machine learning has sprouted, not necessarily to justify decisions in terms of fairness, but rather to better understand where the system may be getting things wrong in the real world, despite high accuracy on the data. As Caruana demonstrated, not knowing the features that define the outcome of a system may result in disastrous decision-making – despite claims of outperforming human experts. Epistemic opacity is not only a problem for due process or fair computing but also for robust AI in the mundane sense of a system having the functionality it is claimed to ‘have’. Nevertheless, explanations are crucial in the domain of fair machine learning and responsible AI, another concept closely aligned with but not synonymous with robust AI. Explanations may contribute to a better understanding of why a specific individual decision was taken. For instance, by enabling to check whether such a decision was legally justified. This does not mean that an explanation of how an algorithm came to its conclusion necessarily justifies a legally relevant decision that was based on its output, as this depends on entirely different reasons. Those reasons are qualified as valid in the domain of law, which has other criteria for robustness and other methodologies for testing the validity of the legal norm that decides whether or not a decision was justified.

In the next blog (part II of the series on robust AI and robust law), I will develop an understanding of robust law, based on the foundational conceptual structure of the rule of law. This conceptual structure is in turn firmly grounded in speech act theory, affordance theory and post-phenomenological philosophy of technology, highlighting the real-world events and technological infrastructures that make both law and the rule of law. This fits my call for a philosophy of technology for computational law, which inevitably implies a philosophy of technology for modern positive law – taking note of the fact that this is articulated in the technologies of text.