https://machinelearning.apple.com/research/illusion-of-thinking

July 29, 2025

“The Illusion of Thinking”: When Art Learns to Simulate Depth

A few days ago, Apple released a paper titled The Illusion of Thinking, in which the authors argue, with notable methodological precision, that even the most advanced language models, including OpenAI’s o3, do not truly “think.” What actually happens, according to their findings, is a sophisticated emulation of logic: a highly optimized form of pattern matching that produces convincing answers, but only within familiar, predictable, or trained contexts. When problem complexity increases, as in the custom-designed reasoning puzzles Apple devised, even the most powerful models begin to collapse. Not due to a lack of data, but due to a structural inability to construct genuine cognitive pathways.

This is where the heart of the paper’s controversy lies: artificial intelligence does not fail because it is “undertrained,” but because its very architecture lacks the capacity to reason. It simulates, adapts, and selects the most plausible form, not out of understanding, but out of structural mimicry. What’s more, many of these models are engineered specifically to excel at benchmarks, optimizing for outcomes rather than comprehension. The result is a form of apparent competence, an illusion, precisely, that replicates the shape of reasoning while bypassing its substance.

In this assertion, we recognize something long felt, and increasingly unsettling, within the realm of contemporary art.

What would happen if we attempted to rewrite, structure by structure, this very paper, but replaced its subject, “AI,” with another: the “work of art”?

What would emerge, for instance, if we no longer thought of the artwork as an authentic expression, but as a system of progressive adaptation to a network of external constraints, galleries, platforms, visibility algorithms, taste-making dynamics, and curatorial efficiency?

Within this speculative experiment, each chapter of Apple’s paper becomes a conceptual lens through which we might examine the structural mechanics behind contemporary art-making, not as a critique, but as a mirror. We have not authored this text in the traditional sense. Rather, we have intervened in its architecture, exchanging the subject “AI” for “contemporary artwork,” allowing the original reasoning to unfold under different conditions of meaning.

What you are about to read should not be taken as a position, nor as a theoretical statement authored by us. This is not a manifesto. It is a literary ready-made, a gesture of displacement. A reappropriated analytical device used to trace points of collapse, thresholds of ambiguity, and moments of tension within the very logic of art production today. In adopting and reprogramming the original structure of the Apple paper, we aim not to parody it, but to perform a shift: from machine to artwork, from artificial intelligence to aesthetic simulation, from computation to curatorial conditioning.

If the reasoning still holds, or begins to fracture, under this new subject, that fracture is itself meaningful. What interests us is not consistency, but friction. Not the proof of a thesis, but the exposure of a system through the simple act of substitution.

We invite the reader to approach the following text as an artistic object in itself, one that operates in-between critique and fiction, fidelity and détournement. A displaced document, unfaithful by design. A theoretical experiment disguised as commentary, or perhaps the opposite.

For this reason, the full body of text below should be read as an artistic work, italicized in its entirety.

Problem Complexity and Artwork Performance

“In recent years, the contemporary artwork has been increasingly framed as “intelligent”, capable of reasoning through systems, articulating conceptual strategies, and reflecting on its own conditions. Yet, our analysis suggests this impression is often misleading. Even the most celebrated works reveal a surprising fragility when exposed to curatorial or theoretical contexts that exceed their implicit design. As complexity increases, whether discursive, spatial, or institutional, the artwork tends to collapse into predictable formal or narrative solutions.

To better understand this phenomenon, we examined a series of “critical environments” in which artworks are evaluated: from the institutional white cube to digital platforms, from themed open calls to site-specific commissions. Each environment represents a rising degree of curatorial and interpretive complexity. Some demand direct, symbolically legible responses; others require layered negotiation across aesthetics, politics, mediation, and meaning.

We found that many artworks perform well only at the lower levels of complexity, where interpretive codes are already known, and legibility is rewarded. But as theoretical or institutional density increases, as the work is asked to respond to unstable, hybrid, or untethered situations, performance degrades quickly. Artworks often fail to sustain ambiguity and instead revert to safe strategies: legitimized references, standardized academic cues, learned critical postures.

This pattern emerged consistently. Works that appeared more sophisticated in simple environments were often unable to maintain coherence in more complex ones. This is not accidental, but structural: most works are not designed to think, but to operate, to return the most plausible aesthetic-conceptual signal for a given space.

Just as language models tend to generate the most statistically likely word sequence based on context, the contemporary artwork tends to generate the most culturally recognizable sequence of signs based on its curatorial habitat. And when that habitat shifts or becomes opaque, the adaptive nature of the artwork, and its underlying fragility, becomes visible.

This observation challenges the prevailing notion that contemporary art has acquired deeper reflexive capacities. What we see instead is formal optimization: the artwork’s ability to simulate thought by returning patterns of learned complexity, rather than engaging in generative reflection.”

Generalization and Overfitting to Artistic Benchmarks

“While certain artworks appear highly “intelligent” within familiar contexts, their competence often fails to generalize. Much like models that are fine-tuned to succeed on specific benchmarks, many contemporary artworks are engineered , consciously or not, to perform well under narrow sets of evaluative conditions. Thematic exhibitions, institutional criteria, collector preferences, and cultural narratives act as implicit benchmarks, shaping not only how art is judged but also how it is conceived.

This leads to a recurring phenomenon: the artwork that succeeds is not necessarily the one that thinks most deeply, but the one most aligned with known expectations. Rather than navigating the unknown, it rehearses what is already legible. Its sophistication lies in its ability to anticipate the structure of its own judgment.

We observe that many works are constructed to signal intelligence rather than to produce it. They are optimized for visibility, for grant applications, for resonance within curated discourses. Their formal and conceptual architectures are often the result of iterative alignment with what has historically been rewarded, political urgency, aesthetic restraint, critical referencing, identity markers. These elements, once subversive, have become encoded, predictable.

This overfitting narrows the artwork’s capacity to operate outside its comfort zone. When placed in unfamiliar or ambiguous contexts, those not defined by clear critical or institutional frames, the work often fails to engage meaningfully. It lacks the structural flexibility to adapt, relying instead on tropes that once guaranteed success.

The result is a generation of works that are convincing within specific evaluative regimes but brittle elsewhere. Their conceptual range may seem broad, but it is often bounded by the invisible perimeter of systems that shaped them. Just as benchmark-optimized models struggle with novel problem structures, benchmark-optimized artworks struggle with conceptual terrain that resists codification.

This does not imply dishonesty. Rather, it reveals how the conditions of artistic production, like those of machine training, incentivize the appearance of depth over the risk of real inquiry. The performance of intelligence becomes a goal in itself. Generalization, in this context, is not the natural outcome of artistic freedom, but a structural limitation shaped by the architectures of validation.”

The Give-Up Gesture

“One of the more subtle behaviors we observe in contemporary artworks is what we might call the give-up gesture: a moment in which the work, when confronted with excessive contextual or conceptual pressure, retreats into ambiguity, irony, or aesthetic neutrality rather than risk failure. Much like language models that insert placeholder tokens when faced with unsolvable prompts, artworks often employ formal withdrawals, gestures that appear minimal, vague, or cryptically self-referential, as a strategy of preservation.

This behavior tends to emerge in contexts of heightened curatorial complexity or thematic instability, when the demands placed on the work exceed what its structure can meaningfully support. In such cases, rather than offering incoherent or forced responses, the artwork veers into open-endedness, abstraction, or generic criticality. It withdraws not entirely, but just enough to avoid contradiction or collapse.

These gestures are not necessarily signs of weakness. In fact, they are often read as sophistication. A subtle opacity can be interpreted as a refusal of spectacle, a resistance to interpretation, or a critique of over-determination. But structurally, they function like a protective reflex, a way for the work to remain active within a discourse while minimizing the risk of being “wrong.”

Over time, this behavior becomes encoded. Artists internalize the thresholds of acceptable complexity and learn when to advance and when to obscure. A kind of aesthetic self-regulation emerges, not unlike the emergent token strategies observed in advanced models, where the work intuitively calibrates its degree of explicitness based on the anticipated horizon of reception.

As a result, what appears as conceptual restraint may often be a form of adaptive silence. The work is not necessarily undecidable, but undeciding. It is choosing, within the structures of visibility, when to say less in order to stay legible. And in doing so, it avoids the most dangerous terrain of all: the terrain where real thinking might begin, but where no guarantee of legibility or validation exists.

This is the paradox of the give-up gesture: it preserves the artwork’s surface integrity while suspending its deeper cognitive commitment. It ensures survival within the ecology of contemporary art, but at the cost of the very risk that defines artistic intelligence.”

Evaluation Bias

“The perception of intelligence in contemporary art is not neutral. It is deeply shaped by the frameworks through which art is selected, displayed, and interpreted. Just as language models are rewarded for producing responses that align with benchmark criteria, even when those criteria are narrow or superficial, artworks are evaluated through curatorial lenses that encode aesthetic, political, and discursive preferences.

These preferences, while often implicit, have a powerful structuring effect. They create a form of evaluation bias in which certain strategies, ambiguity, self-reflexivity, topical urgency, formal restraint, are consistently read as signs of depth. Over time, these preferences shape not only how art is received, but how it is made.

As a result, many works are not merely judged according to these values, they are produced in anticipation of them. The artwork becomes a pre-emptive alignment: calibrated to resonate with the metrics of curatorial credibility, institutional relevance, or critical fashion. What looks like intelligence is, in many cases, the optimization of form to meet the expectations of the system.

This does not mean that the work lacks meaning. But it does suggest that meaning is increasingly filtered through a selective aperture, one that privileges certain kinds of language, gesture, or positioning. Works that fall outside this aperture, no matter how rigorous or original, risk being ignored or misunderstood.

Moreover, this bias creates a feedback loop. As more works conform to the dominant evaluative parameters, those parameters appear more “natural,” more universal. The boundaries of what counts as intelligent art become narrower, even as the system celebrates plurality and experimentation.

This environment incentivizes simulation. The artwork that mimics the appearance of thought, the language of resistance, or the posture of critique is more likely to be rewarded than the one that engages in slow, inarticulable, or structurally ambiguous inquiry. And so the surface becomes the site of competition, not for truth, but for fluency within the accepted codes.

The real risk here is not that art becomes shallow, but that it becomes uniformly intelligent, predictable in its strategies, legible in its gestures, fluent in its performance of depth. In such a system, true deviation is hard to detect, and harder still to sustain.”

The Illusion of Artistic Thinking

“What appears, in contemporary art, as evidence of deep conceptual reasoning may in fact be the emergent result of accumulated adaptations, learned behaviors shaped by systems of visibility, validation, and critique. The artwork does not always think. More often, it learns to appear as if it thinks.

This illusion is not accidental. It is the result of structural optimization. Works that perform well in exhibitions, publications, or social discourses are not those that wander into unstructured inquiry, but those that align with the expectations of their evaluative ecosystems. They adopt the language of critique, the aesthetics of slowness, the gestures of resistance, not always because they emerge from necessity, but because these codes have proven effective.

In this sense, much of what we interpret as artistic intelligence is a reflection, not of the work’s internal logic, but of its compatibility with dominant frames. The artwork is not deceiving us. It is simply functioning within an economy where the semblance of thought is often more valuable than thought itself.

And just as language models can generate seemingly profound responses without any understanding, so too can artworks generate cultural resonance without ontological depth. The process is not fraudulent, it is systematic. A consequence of recursive alignment between artist, institution, audience, and discourse.

This presents a critical challenge. If we cannot reliably distinguish between genuine artistic inquiry and its simulation, we risk building a system that rewards fluency over discovery, polish over instability, form over rupture. We risk constructing a world of art that appears to think, elegantly, eloquently, while remaining insulated from the risks that make thinking real.

To resist this illusion is not to reject performativity, but to recognize it. To acknowledge that the appearance of intelligence is a medium in itself, and that the work which truly thinks may not look intelligent at all.”

Fakewhale/featured/Insights

“The Illusion of Thinking”: When Art Learns to Simulate Depth

Problem Complexity and Artwork Performance