Annotation Across Two Worlds: Linguistics vs. NLP
By Stella Bullo · Updated 4 September 2025
Introduction
Annotation sounds like one thing: adding information to data. But ask a linguist and ask an NLP engineer and you will get two very different answers. For a discourse analyst, annotation is a way of uncovering how language enacts power, identity and metaphor. For an NLP practitioner, annotation is the backbone of training data, short and consistent labels that feed machine learning models.
This piece compares the two traditions, showing how the same text can be read through the lenses of linguistic annotation and AI/NLP annotation.
Personal note
I come at annotation from both sides. My training is in linguistics and discourse analysis, where annotation is a way to unpack identity, agency and metaphor. More recently I have been working with annotation in the context of NLP projects, where the same word means something much more operational, preparing data so models can learn. Living in both of these spaces makes the differences clear, but it also shows how much the two approaches can gain from one another.
What Linguists Mean by Annotation
In linguistics and discourse studies, annotation is interpretive and theory-driven. Analysts draw on frameworks that give them categories to work with, and each framework shines a light on a different layer of meaning.
Systemic Functional Linguistics (SFL) (Halliday & Matthiessen, 2014) helps identify process types, material, mental, verbal, relational, and map how participants enact roles of actor, sayer or experiencer. This is a way of capturing agency and voice in clauses.
Critical Discourse Analysis and Appraisal theory (Fairclough, 1992; Martin & White, 2005; Wodak & Meyer, 2009) provide tools for looking at stance, evaluation and ideology. Connectives like but or even when are treated seriously, as resources that shift authority or open space for resistance.
Conceptual Metaphor Theory (CMT) (Lakoff & Johnson, 1980) focuses on how abstract experiences are structured through source domains such as weight, battle or movement. Saying push through the pain draws on the physical act of pushing against resistance to frame the experience of illness as a struggle that can be overcome.
The point is to uncover layers of meaning: who is speaking, how they position themselves, what metaphors they use and which social schemas they activate. Annotation here is less about volume and more about depth, linking the small details of grammar or word choice to wider patterns of identity, ideology and lived experience.
Example
“Doctors keep telling me it’s just stress, but I know my body better than anyone. I push through the pain to go to work, even when it feels impossible.”
- Doctors keep telling me… → the patient is positioned as subordinate, doctors hold institutional authority (SFL: verbal process, asymmetry of roles).
- but I know my body better than anyone → adversative stance (Appraisal: counter-expectation), reclaiming epistemic authority.
- I push through the pain → material process, resilience, warrior-like identity, and a metaphor of illness as a barrier (CMT).
- even when it feels impossible → concessive stance (Appraisal: concession), which intensifies the sense of agency.
Using more than one framework is not just a stylistic choice but a sign of rigour. SFL maps roles and processes, CDA and Appraisal uncover stance and ideology, and CMT reveals the metaphorical scaffolding of experience. Taken together, these approaches provide a fuller picture of how language enacts identity and resistance. They also show that linguists do not sit in silos, we borrow, adapt and combine. That interdisciplinarity is essential when annotation is used not only in academic research but also in fields like AI and health communication.
What NLP and ML Practitioners Mean by Annotation
In the AI world annotation is operational. The aim is not interpretation but structured training data that models can learn from. Annotation guidelines are designed for clarity and speed, and the focus is on consistency and scalability. Annotators are expected to tag quickly and uniformly so that the dataset is machine-readable and reliable across huge volumes of examples.
Common types
- Text classification, categories like health, finance or politics, or simple sentiment values such as positive, negative or neutral.
- Entity recognition, spans tagged as names, symptoms, locations or dates.
- Intent classification in conversational AI, utterances marked as book a flight, ask a question or cancel an order.
- Coreference and relations, linking “she” to “Mary,” or doctor to hospital.
- Safety and moderation, harmful or misleading content flagged.
Layers
- Token level, each word tagged for part-of-speech or entity type.
- Span level, multi-word chunks such as chronic pain marked as a single unit.
- Sentence level, entire utterances labelled for sentiment, intent or safety.
- Document level, longer texts tagged for topic or stance.
- Discourse / cross-document, references linked across sentences or documents.
Example
“Doctors keep telling me it’s just stress, but I know my body better than anyone. I push through the pain to go to work, even when it feels impossible.”
- Token level: “Doctors” → PERSON, “stress” → CONDITION, “pain” → SYMPTOM.
- Sentence-level sentiment:
- “Doctors keep telling me…” → negative, dismissive.
- “but I know my body…” → positive/neutral, confidence.
- “I push through the pain…” → mixed, negative experience and positive resilience.
- Document-level stance: overall negative towards medical authority and positive self-assertion.
What matters here is consistency and computability. Labels need to be unambiguous, reproducible and simple enough that annotators can agree on them. While this often trims nuance, it enables datasets to scale to millions of examples. The challenge, and the opportunity, lies in designing schemas that are efficient and still sensitive to the ethical and social realities encoded in data.
Side by Side Comparison
Aspect | Linguistics / Discourse | NLP / Machine Learning |
---|---|---|
Unit of analysis | Clause or phrase | Token, sentence, document |
Labels | Identity, agency, metaphor, stance | Sentiment, entity, intent, topic |
Purpose | Interpretation and uncovering ideology | Dataset creation for model training |
Output | Narrative and layered analysis | CSV or JSON with categorical labels |
Strength | Nuance and theoretical depth | Scale and consistency |
Limitation | Small scope, subjective interpretation | Oversimplification, hidden bias |
What Each World Can Learn from the Other
From linguistics, NLP can gain awareness of how identity, power and metaphor shape data. Without that awareness, AI models risk reproducing the very biases they are meant to avoid. From NLP, linguistics can borrow methods of standardisation, agreement checking and scale, strengthening corpus-based research.
The most exciting possibilities come from mixing the two, hybrid annotation frameworks where automatic tagging handles scale, while human analysts add interpretive layers that capture nuance.
Conclusion
Annotation is never neutral. Whether in a linguistics seminar or an AI lab, the act of assigning labels is also the act of assigning meaning. In linguistics, annotation shows how language encodes power, identity and metaphor. In NLP, annotation creates the datasets that will shape how machines understand human communication.
Both traditions have strengths, the nuance and theoretical depth of discourse analysis, and the scale and reliability of computational annotation. Both also have limitations, subjectivity and small scope on the linguistic side, oversimplification and bias reproduction on the computational side.
The future lies in building bridges. Hybrid annotation can bring the best of both worlds, creating large-scale datasets enriched by discourse-sensitive insights. This does not just improve models, it also ensures that the human realities in language — identity, agency, resistance, resilience — are not flattened into categories.
As someone moving between these two spaces, I see annotation not as a mechanical step but as a meeting point between humanities and technology. Done well, it can produce systems that are more accurate, more ethical and more attuned to the complexities of lived experience. And if we can sneak a little discourse analysis into the machine learning pipeline along the way, that is a win for both sides.