What Works, for Whom, in What Circumstances? Realist Evaluation and Design's Theory of Change

Saturday, July 1st, 2023 · 17 min read

In a previous post I worked through von Busch and Palmås's (2023) Realdesign propositions against my experience at SCÖ, and the exercise left me with a question I could not answer within their framework. I had been operating on the assumption that making the situation specific enough - concept maps, typed interface definitions, architecture diagrams that named what would actually need to exist - would generate the conditions for acting on what those artefacts revealed. Von Busch and Palmås helped me see that it hadn't; that specificity without political backing produces spectacle rather than leverage. But their contribution was diagnostic, not methodological: they offered a set of political questions to ask, not a way of understanding when and why design's materialising practice produces change and when it doesn't.

The phrase I kept returning to was "sunlight being the best disinfectant" - Brandeis's metaphor, cited by Hood and Heald (2006), for the belief that sunlight, or transparency, or visibility, corrects by its mere presence. I had been operating as though design's materialising practice carried its own warrant: make the situation specific enough, make the impossibility tangible enough, and the organisation would respond, and, that such visibility was the backbone or pre-requisite for the sort of participatory design or collaborative, iterative user-driven approach that the programmes I was working in aspired towards. Von Busch and Palmås told me why that was wrong - organisations don't respond to information, they respond to power. But what they offered was a disposition, a set of political questions, rather than a methodology for understanding when and why visibility produces change and when it doesn't. This post attempts to find that methodology, drawing on Pawson's realist evaluation framework and Greenhalgh's NASSS framework, and asks what a realist theory of change for design practice would look like.

Design's implicit programme theory

Every design intervention carries an implicit programme theory - a causal model of how the intervention is supposed to produce its intended effects. As Drabble, Morelli and De Götzen (2023) argue, design and planning literature has given "marginal relevance" to theory of change, despite design being conceived as a strategic activity to improve socio-technical systems. The result is that design's causal assumptions remain implicit, untested, and - when they fail - unexplained.

The implicit programme theory runs something like this: materialise the problem; make it visible to stakeholders; shared understanding emerges; action follows. Bason (2017) articulates the positive version when he describes design's contribution as offering "highly concrete research tools" that range from "ethnographic, qualitative, user-centred research, to probing and experimentation via rapid prototyping, to visualizing vast quantities of data and information". The assumption is that this concreteness - this visibility - produces better understanding and therefore better outcomes. It is the logic that underpins every design presentation that ends with a journey map and a set of recommendations, every prototype that makes the abstract tangible in the hope that tangibility compels response.

Critical design theory has been questioning this assumption from several directions, though none has connected the critique to a constructive methodology for understanding when visibility works and when it doesn't. Bailey (2021), in her ethnography of design for government in the UK, offers perhaps the most forensic analysis: "the technologies of design for government rapidly materialise discourse - they convert abstract ideas into tangible form, acting as devices for visualising things-to-be-governed". Through a Foucauldian lens, she shows that design's visibility is not neutral epistemology but a form of governmentality; design's technologies "literally bring new things into view, appearing to render those objects less mysterious, and more amenable to management", yet in doing so they "construct specific possibilities for action and thought - and rule out others". What gets made visible, and for whom, is already a political act; the map that reveals one set of relations necessarily obscures another.

DiSalvo (2015), drawing on Mouffe's agonistic pluralism, theorises a design practice that explicitly abandons the assumption that visibility leads to consensus or action. For DiSalvo, "revealing hegemony is perhaps the most basic tactic" of agonistic design, but this revelation is about opening contestation, not producing rational agreement. His honest limit is instructive: "If we abandon the notion that any one design will completely or even adequately address our social concerns or resolve our social issues, then adversarial design can provide those spaces of confrontation—in the form of products, services, events, and processes—through which political concerns and issues can expressed and engaged". Where the mainstream visibility theory assumes that making the problem visible leads to a shared response, DiSalvo's agonism assumes it leads to dissensus - and that dissensus is the productive outcome, not a failure of the design.

Julier and Kimbell (2019) push further still, arguing that design's representations can become substitutes for the change they claim to produce. Their concept of "virtualism" captures something I recognise from my own experience: "the change that social design proposes, and thus the resolution of issues such as inequalities, are encapsulated and sealed into the Post-its and their representation". The Post-it becomes the change. The journey map becomes the understanding. The prototype becomes the proof. Julier and Kimbell conclude that "social design practice is destined not to tackle the causes and consequences of inequalities, even while being enrolled in social change and policy development" - not because designers lack good intentions, but because the institutional conditions in which social design operates structurally prevent it from acting on what its own methods reveal. The point connects directly to what I described as the gap between performance and substance in programme cultures: the appearance of progress substituting for the fact of it.

Each of these critiques identifies a different failure mode for design's visibility theory - governmentality, agonistic dissensus, virtualism, political naivety - but none provides a systematic methodology for understanding under what conditions visibility produces change and under what conditions it doesn't. The realist evaluation tradition, I want to argue, offers exactly that.

What works, for whom, in what circumstances

Pawson's realist evaluation begins with a deceptively simple reframing. The evaluator's question is not "does this programme work?" but "what works, for whom, in what circumstances?" (Pawson, 2013). The shift is from a binary verdict to a contextual explanation. Programmes are, in Pawson, Greenhalgh and Harvey's (2005) formulation, "fragile creatures, embedded in multiple social systems" - "rarely, if ever, is a programme equally effective in all circumstances because of the influence of context". The same intervention, delivered identically, can produce different outcomes in different settings; the context is not a backdrop to the intervention but an active ingredient in determining whether it works.

The analytical unit of realist evaluation is the CMO configuration: Context + Mechanism = Outcome. The mechanism is the causal process through which the intervention is supposed to work; the context is the set of conditions that enable or constrain that process; the outcome is what actually happens. As Pawson (2006) puts it, understanding causal powers is an explanatory quest - "knowing how social programmes work involves tracing the limits on when and where they work, and this in turn conditions how, when and where to look for evidence". A programme does not simply work or fail; it fires or misfires depending on whether the contextual conditions activate the intended mechanism or trigger a different one entirely.

Applied to design, the reframing is immediate and, I think, generative. The same design artefact - a concept map, a prototype, a typed interface definition - can fire differently depending on the context in which it lands. At SCÖ, my materialising practice fired as threat in a context where abstraction served institutional interests; the consortium had invested collectively in a technoimaginary that my specificity profaned. In a different context - one with political backing for the design work, institutional readiness to act on what artefacts reveal, governance structures that accommodate emergent learning - the same artefact might fire as enablement, as the thing that allows an organisation to act on what it already suspects but cannot yet see clearly enough to name. The mechanism is constant: materialisation, the conversion of abstraction into specificity. The outcome depends entirely on context.

What design needs, in Pawson's terms, is CMO configurations for its own practice: an understanding of the contextual conditions under which design's mechanisms produce their intended effects, and the conditions under which they produce defensive retrenchment, virtualism, or organisational inertia instead. Drabble, Morelli and De Götzen (2023) gesture toward this, positioning realist evaluation within the broader theory of change genealogy and noting that for Pawson and Tilley, "theories become good as a result of what actors do". But they do not develop the connection into a methodology for design. The connection, I think, is worth developing.

From Pawson to Greenhalgh: complexity across domains

The link between realist evaluation and healthcare technology implementation is not an analogy I am constructing; it is a collaboration that already exists. Pawson, Greenhalgh and Harvey (2005) co-authored "Realist review - a new method of systematic review designed for complex policy interventions", explicitly developing the realist approach for the kinds of multi-level, multi-stakeholder interventions that characterise health and social care. Greenhalgh then extended this thinking into the NASSS framework (Greenhalgh, 2018; Greenhalgh and Abimbola, 2019), designed to explain why technology projects in health and social care so consistently fail to deliver on their promises.

Loading diagram…

NASSS identifies seven domains of complexity: the condition or illness, the technology itself, the value proposition, the adopter system (staff, patients, carers), the organisation, the wider institutional and regulatory system, and embedding and adaptation over time. Each domain can be simple (few components, predictable), complicated (many components but still largely predictable), or complex (many interacting components, dynamic and unpredictable). The more domains that are complex, the less likely the technology will be successfully adopted. The framework does not predict failure; it maps the conditions under which success becomes increasingly unlikely, and identifies which domains would need to shift from complex to complicated or simple for the intervention to have a chance.

Mapped onto the SCÖ experience, NASSS is retrospectively clarifying. The technology was complicated at best (federated learning requiring infrastructure, governance, and data that did not exist). The value proposition was complex (contested between consortium partners with diverging interests). The adopter system was complex (fragmented agencies, caseworkers operating across institutional boundaries with no shared digital infrastructure). The organisation was complex (a multi-partner, multi-national consortium with diffuse accountability). The wider system was complex (ESF funding logic that rewarded ambitious proposals over feasible ones, NPM governance that tracked milestones rather than learning). With this many domains registering as complex, NASSS would suggest that the project was structurally unlikely to succeed regardless of how good the design work was - and that the appropriate response was not better artefacts but a frank assessment of which domains could realistically be simplified.

The NASSS framework operationalises the realist insight for technology implementation specifically: it is a tool for mapping the contextual conditions that determine whether a technology intervention will fire or misfire. Design could use something analogous - a framework for mapping the contextual conditions that determine whether a design intervention will produce its intended effects. Not every design context is an NHS trust or a Swedish welfare consortium, but every design context has institutional conditions, power arrangements, governance structures, and stakeholder dynamics that shape whether materialisation, prototyping, or mapping will produce enablement or defence.

NPM programme cultures as context

New Public Management programme cultures are the specific context in which much public-sector design operates, and they have properties that matter for how design's mechanisms fire. As Skålén (2004) observes, NPM cannot be depicted as a homogeneous reform programme, but its characteristic features are recognisable: stage-gate governance, benefits realisation frameworks, risk-averse accountability structures, milestone tracking, and an emphasis on performance metrics over process learning. The realist evaluation frame, I think, brings something new to that analysis.

In an NPM programme culture, design's visibility mechanism encounters a governance apparatus with a particular relationship to abstraction and specificity. The planning stage rewards abstraction: ambitious proposals, broad benefits statements, visionary narratives that secure funding and political sponsorship. The delivery stage demands specificity: working software, measurable outcomes, evidence of impact. But by the time delivery begins, the plans are locked; the governance framework has committed resources, timelines, and accountability structures to the abstract narrative established at the outset. Design, which produces specificity iteratively and emergently - revealing what the situation actually requires as it goes - arrives in a context that has no pathway for the kind of learning it produces. The programme culture treats design outputs as deliverables to be assessed against milestones, not as probes that reveal whether the programme's theory of change is sound.

This is the "problematique" that Pawson and Greenhalgh share, and that I think connects to Björgvinsson, Ehn and Hillgren's (2012) argument for understanding design milieus as "agonistic public spaces" rather than sites of consensual problem-solving. Social programmes (Pawson) and technology implementations (Greenhalgh) both fail when they are treated as simple interventions to be rolled out rather than as complex systems whose outcomes depend on contextual conditions that cannot be fully predicted in advance. Design in NPM programme cultures faces exactly this problem: it is positioned as a simple intervention - user research produces insights, insights produce recommendations, recommendations produce better services - within a governance framework that cannot accommodate the complexity of what design actually reveals. When the artefacts surface impossibility rather than solutions, the governance framework has no mechanism for responding except to redefine the milestones, decouple the narrative from the evidence, or - as happened to me at SCÖ - remove the source of the unwelcome specificity.

What a realist design practice would look like

Loading diagram…

A realist approach to design practice would begin by making its own programme theory explicit. Rather than proceeding on the implicit assumption that materialisation is inherently productive - that prototyping works, that mapping clarifies, that visibility compels - it would articulate the causal model by which the design intervention is expected to produce change and test that model against the specific context. Not "prototyping works" but "prototyping works when there is institutional readiness to act on what prototypes reveal, when decision-makers have the authority to change course, when the governance framework accommodates emergent learning, and when the political conditions do not penalise specificity". Drabble, Morelli and De Götzen (2023) argue that theory of change should itself be a design object; the realist addition is that the theory of change must be contextually specified, not generic.

Second, a realist design practice would map contextual conditions before designing - or at least alongside designing, iteratively, as the context reveals itself through the design work. Something analogous to NASSS domains adapted for design: what is the institutional context? What are the power arrangements? What governance framework operates? Who has the authority to act on what design reveals? What is the programme culture's relationship to specificity and abstraction? At SCÖ, I was doing the design equivalent of implementing a technology without assessing the NASSS domains - materialising with full methodological rigour while the contextual conditions guaranteed that materialisation would fire as threat rather than enablement. A realist pre-assessment would not have made the politics disappear, but it would have made my own theory of change explicit enough to test, and perhaps to revise before the consequences became personal.

Third, a realist design practice would acknowledge that design's outcomes are always contextually contingent - that the same mechanism fires differently in different contexts, and that this is not a failure of design but is how all social interventions work. Pawson (2006) is clear that "a critical feature of all programmes is that, as they are delivered, they are embedded in social systems"; the variation in outcomes is not happenstance but structure. The question for design is not "does design work?" but "what works, for whom, in what circumstances?" - and having an honest method for arriving at an answer, even when the answer is that the contextual conditions preclude the intended outcome.

What emerges from triangulating these traditions is something more complete than any offers alone. Von Busch and Palmås (2023) offer political literacy - the "who whom?" question, the insistence on mapping power relations before assuming that design's contributions will land as intended. DiSalvo (2015) offers agonistic realism - the honest admission that design cannot resolve political conditions, only reveal and contest them; that "revealing hegemony" is a tactic, not a solution, and that abandoning the hope of resolution is a precondition for doing genuinely political design work. Pawson (2006; 2013) offers methodological discipline - CMO configurations as a structured way of understanding why interventions produce different outcomes in different contexts, and realist synthesis as a method for accumulating knowledge about those patterns across cases.

A realist design practice would need all three: the political analysis that von Busch demands, the agonistic honesty that DiSalvo models, and the evaluative rigour that Pawson provides. The alternative - continuing to assume that materialisation is inherently productive, that visibility is its own warrant, that the best disinfectant works regardless of the conditions in which it is applied - is the theory of change that failed at SCÖ and that Bailey (2021), Julier and Kimbell (2019), and DiSalvo (2015) have been questioning from their different vantage points. The question I am left with, and that I want to take into the next phase of my research, is whether CMO configurations for design practice can be developed empirically - whether, by examining enough cases of design in programme cultures, we might begin to specify the contextual conditions under which design's mechanisms fire productively and the conditions under which they misfire. That would be a realist design research programme in Pawson's sense: not proving that design works, but understanding how it works, for whom, and in what circumstances.

References

Bailey, J. A. (2021). Governmentality and power in 'design for government' in the UK, 2008-2017: an ethnography of an emerging field. PhD thesis, University of Brighton.
Bason, C. (2017). Design for Policy. Routledge.
Björgvinsson, E., Ehn, P. and Hillgren, P.-A. (2012). Agonistic participatory design: working with marginalised social movements. CoDesign, 8(2-3), pp. 127-144.
DiSalvo, C. (2015). Adversarial Design. MIT Press.
Drabble, D., Morelli, N. and De Götzen, A. (2023). Strategic Thinking, Design and the Theory of Change. In N. Morelli, A. De Götzen and F. Grani (Eds.), Service Design for Emerging Technologies. Springer.
Greenhalgh, T. (2018). How to improve success of technology projects in health and social care. Public Health Research & Practice, 28(3), e2831815.
Greenhalgh, T. and Abimbola, S. (2019). The NASSS framework - a synthesis of multiple theories of technology implementation. Studies in Health Technology and Informatics, 263, pp. 193-204.
Hood, C. and Heald, D. (2006). Transparency: The Key to Better Governance? Oxford University Press.
Julier, G. and Kimbell, L. (2019). Keeping the system going: social design and the reproduction of inequalities in neoliberal times. Design Issues, 35(4), pp. 12-22.
Pawson, R. (2006). Evidence-Based Policy: A Realist Perspective. SAGE.
Pawson, R. (2013). The Science of Evaluation: A Realist Manifesto. SAGE.
Pawson, R., Greenhalgh, T., Harvey, G. and Walshe, K. (2005). Realist review - a new method of systematic review designed for complex policy interventions. Journal of Health Services Research & Policy, 10(Suppl 1), pp. 21-34.
Skålén, P. (2004). New public management reform and the construction of organizational identities. International Journal of Public Sector Management, 17(3), pp. 251-263.
Von Busch, O. and Palmås, K. (2023). The Corruption of Co-Design: Political and Social Conflicts in Participatory Design Thinking. Routledge.