Why Metaphor Matters in NLP
What figurative language reveals about human experience and how machines can learn to listen
Dr. Stella Bullo
Introduction
Do machines always know how to listen? They can autocomplete a sentence, summarise a news article, even generate poetry. When language stretches beyond the literal into the figurative, systems begin to falter. This blind spot matters because metaphor is where lived experience becomes visible.
Work across linguistics and psychology has shown that metaphor is part of cognition, not decoration. Lakoff and Johnson described how metaphor structures thought. Gibbs examined figurative processing in everyday understanding. In health communication, Elena Semino and colleagues explored how images of battle or journey influence how people talk about illness and how clinicians respond. Computationally, resources such as the VU Amsterdam Metaphor Corpus and evaluations in SemEval have pushed detection and interpretation. These traditions highlight both pervasiveness and modelling difficulty.
This article builds on that research and adds a practical dimension, asking what happens when a linguist becomes a developer and designs tools that take metaphor seriously.
What metaphor reveals
Metaphor is a way of making experience graspable when literal words fall short. In clinical communication, a person who says “it feels like barbed wire around my spine” is trying to convey constriction, sharpness, and persistent discomfort. Someone describing asthma as “a fist closing inside my lungs” communicates suffocation and panic more vividly than a numeric scale. Metaphor turns the intangible into something shareable. If NLP overlooks this language, it overlooks signals that matter for care.
Where metaphor affects NLP tasks
Ignoring metaphor is not neutral. It undermines core applications and weakens outcomes.
- Sentiment and emotion analysis. Figurative wording often signals heightened affect. If a system treats “a storm in my head” as weather talk, it will miss distress or agitation.
- Information extraction. Symptoms and attributes often appear through imagery, not only keywords. “It burns like fire” encodes temperature and sensation without naming fever or heat.
- Summarisation. A good summary preserves experiential meaning, not only clinical facts. Stripping metaphor risks flattening the person’s voice.
- Classification and triage. Risk assessment improves when metaphor is treated as evidence. “Like knives twisting” can indicate severity more clearly than a number on a scale.
- Search and retrieval. Conceptual normalisation enables better matches. A system should connect “crushing weight on my chest” with “chest pressure” even if the words differ.
How linguistics can guide better models
Linguistics offers tools that make metaphor interpretable rather than opaque.
- Taxonomies. Metaphors cluster into families such as heat, intrusion, force, predation, container, and weight. Grouping them helps models generalise beyond single phrases.
- Patterns and cues. Certain verbs, nouns, and constructions reliably flag figurative usage. Collecting them creates rule based anchors for detection.
- Pragmatics. Who is speaking, to whom, and with what purpose matters. Figurative choices differ in a consultation, a private message, or a public forum.
- Appraisal and stance. Boosters, hedges, and evaluations calibrate intensity and certainty. “It is unbearable” signals more urgency than “it is uncomfortable.”
- Cross register variation. Communities differ in figurative choices. Dialect, age group, and professional background all influence which metaphors feel natural.
From research to tooling
These principles are not only theoretical. We translated them into a compact prototype: a metaphor tagger and a small web application. The goal was not poetic analysis but practical communication, hearing what a person tries to say and producing outputs useful to both patient and clinician.
To make this possible we designed a stack and workflow that turn linguistic principles into a functioning system.
Stack
Python handles text processing, spaCy provides tokenisation and lemmatisation, curated regex deliver high precision pattern matching, and a taxonomy stored in JSON keeps categories maintainable. Flask serves an API and a simple interface. Small Transformer components are added only when disambiguation requires extra context.
Pipeline
Input text is normalised and segmented. Regex and lexicon rules propose metaphor candidates and attach labels from the taxonomy. Context windows are passed to a light classifier when needed. Each candidate is linked to experiential hints and to plain clinical paraphrases. The system produces two summaries, one written for the patient and another written for the clinician, both printable and ready for consultation.
Why hybrid rules help
Rules make decisions interpretable and keep false positives low. They provide transparency about why an expression was tagged. The classifier resolves boundary cases and register shifts that rules alone cannot capture. The combination provides speed, clarity, and enough flexibility to work across speakers and contexts.
Conclusion
Figurative language is not a problem to correct. It is a resource to understand. In healthcare communication, metaphors carry clues about pain, urgency, and emotion that are too important to ignore. When NLP treats metaphor as signal, summaries become more faithful, extraction becomes more complete, and recommendations become more humane.
Linguistics brings models closer to meaning by making patterns explicit, embedding pragmatics, and honouring variation. Tools like a metaphor tagger and communication focused apps show how this integration works in practice. They show that machines can be taught not only to parse words but to listen for what lies behind them. If we want NLP to support care, mental health, education, and other domains where human stakes are high, we need systems that take figurative language seriously. Humane technology begins by listening first and processing second.