AI, Machine Learning, and NLP: Origins, Relationships, and Taxonomy
Introduction
The terms Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP) are often used interchangeably in casual conversation, but they refer to very different things. Understanding their origins and how they relate to each other is essential for anyone working in the field. This post traces the history of each term, clarifies the relationships between them, and provides a clean taxonomy of methodologies vs. domains vs. tasks.
1. When Were These Terms Coined?
Artificial Intelligence (AI)
- Coined by: John McCarthy
- Year: 1956
- Context: At the famous Dartmouth Summer Research Project on Artificial Intelligence held at Dartmouth College, USA. McCarthy, along with Marvin Minsky, Nathaniel Rochester, and Claude Shannon, organized this workshop, which is widely regarded as the birth of AI as a formal academic discipline.
Machine Learning (ML)
- Coined by: Arthur Samuel
- Year: 1959
- Context: While working at IBM, Samuel developed a self-learning checkers program. He defined Machine Learning as “the field of study that gives computers the ability to learn without being explicitly programmed.”
Natural Language Processing (NLP)
Unlike AI and ML, NLP does not have a single definitive originator or moment of creation. It evolved gradually:
| Year |
Milestone |
Key Figures |
| 1950 |
Alan Turing published “Computing Machinery and Intelligence”, proposing the Turing Test — testing machine intelligence through natural language conversation |
Alan Turing |
| 1954 |
Georgetown-IBM Experiment — the first public demonstration of machine translation (translating 60+ Russian sentences into English automatically). Widely considered the starting point of NLP research |
Leon Dostert, Paul Garvin (Georgetown University & IBM) |
| 1957 |
Noam Chomsky published “Syntactic Structures”, introducing generative grammar theory that laid the linguistic foundation for computational language understanding |
Noam Chomsky |
| 1960s |
The term “Natural Language Processing” began appearing widely in academic literature, though no single person is credited with coining it |
— |
| 1966 |
ELIZA, one of the earliest chatbots, was created at MIT |
Joseph Weizenbaum |
Summary: If a single starting point must be chosen, the 1954 Georgetown-IBM machine translation experiment is the consensus origin of NLP research. Noam Chomsky’s formal language theory provided its critical theoretical foundation.
2. How Do AI, ML, and NLP Relate to Each Other?
The Big Picture
Artificial Intelligence (AI) ─── The overarching goal: make machines intelligent
│
├── Machine Learning (ML) ─── A methodology: learn from data
│ ├── Deep Learning (DL)
│ └── Traditional ML (SVM, Decision Trees, etc.)
│
└── Natural Language Processing (NLP) ─── An application domain: understand & generate human language
├── Rule-based approaches (1950s–1980s)
├── Statistical / ML-based approaches (1990s–2012)
└── Deep Learning-based approaches (2013–present)
Key Distinctions
| Concept |
Nature |
Analogy |
| AI |
A broad goal (make machines intelligent like humans) |
“Medicine” as a discipline |
| ML |
A method/means to achieve AI (learn from data) |
“Drug therapy” as a treatment method |
| NLP |
An application domain of AI (process human language) |
“Cardiology” as a specialty |
The Intersection
- NLP is a sub-field of AI: NLP’s goal (making machines understand language) is part of AI’s overarching objective.
- ML is the primary method used in NLP: Modern NLP relies almost entirely on ML (especially Deep Learning).
- NLP ≠ ML: Early NLP was predominantly based on hand-crafted rules and linguistic knowledge, not ML at all.
┌─────────────────────────────────┐
│ AI (Artificial │
│ Intelligence) │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ ML │ │ NLP │ │
│ │ (Method) │ │ (Domain) │ │
│ │ │ │ │ │
│ │ ┌─────┼──┼────┐ │ │
│ │ │ Modern NLP │ │ │
│ │ │ (ML-powered)│ │ │
│ │ └─────┼──┼────┘ │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────┘
3. Methodology vs. Domain vs. Task — A Clean Taxonomy
A common source of confusion is mixing up methods, domains, and tasks. Here is how to think about them:
| Category |
Question It Answers |
Examples |
| Methodology (How) |
How does the machine learn? |
Deep Learning, Reinforcement Learning, Supervised Learning, Unsupervised Learning, Transfer Learning |
| Domain (What) |
What type of data/problem are we dealing with? |
NLP, Computer Vision (CV), Speech Processing, Multimodal AI, Robotics |
| Task (Specific What) |
What specific job are we doing? |
Image Recognition, Text-to-Image, Text-to-Video, Text-to-Speech, Machine Translation, Q\&A |
Full Hierarchy
AI (Artificial Intelligence)
│
├── Methodologies (How to learn)
│ ├── Machine Learning (ML)
│ │ ├── Supervised Learning
│ │ ├── Unsupervised Learning
│ │ ├── Reinforcement Learning ← Method
│ │ └── Deep Learning ← Method (subset of ML)
│ │ ├── CNN, RNN, Transformer ...
│ │ └── (Can combine with any learning paradigm above)
│ └── Non-ML approaches (rule-based systems, search algorithms, knowledge graphs, etc.)
│
├── Domains (What type of data)
│ ├── Natural Language Processing (NLP) — Text
│ ├── Computer Vision (CV) — Images / Video
│ ├── Speech Processing — Audio / Voice
│ ├── Multimodal AI — Cross-modal
│ └── Robotics — Physical interaction
│
└── Tasks (Specific applications under domains)
├── Image Recognition / Classification → CV
├── Text-to-Image (e.g., DALL·E, SD) → Multimodal (NLP + CV)
├── Text-to-Video (e.g., Sora) → Multimodal (NLP + CV)
├── Text-to-Speech (TTS) → Multimodal (NLP + Speech)
├── Machine Translation → NLP
└── Dialogue / Q&A → NLP
Key Nuances
| Concept |
Clarification |
| Deep Learning |
Strictly a subset of ML (methods based on deep neural networks). It can be combined with supervised, unsupervised, or reinforcement learning. |
| Reinforcement Learning |
A learning paradigm (learning through interaction with an environment via reward/penalty feedback). It can be implemented with or without deep learning (Deep RL vs. tabular RL). |
| Text-to-Image / Text-to-Video |
These are specific tasks, one level more granular than “domain”. They belong to the Multimodal or Generative AI domain. |
4. Three Eras of NLP
| Era |
Period |
Core Approach |
Notable Examples |
| Rule-based |
1950s–1980s |
Hand-written grammar rules, expert systems |
ELIZA, SHRDLU |
| Statistical / ML |
1990s–2012 |
Statistical models, traditional machine learning |
HMM, CRF, SVM, TF-IDF |
| Deep Learning |
2013–present |
Neural networks, pre-trained large models |
Word2Vec (2013), Transformer (2017), BERT (2018), GPT series (2018–), ChatGPT (2022) |
5. Timeline of Key Milestones
| Year |
Event |
| 1950 |
Alan Turing publishes “Computing Machinery and Intelligence”, proposes the Turing Test |
| 1954 |
Georgetown-IBM experiment — first machine translation demo (origin of NLP) |
| 1956 |
John McCarthy coins “Artificial Intelligence” at the Dartmouth Conference |
| 1957 |
Noam Chomsky publishes “Syntactic Structures” |
| 1959 |
Arthur Samuel coins “Machine Learning” |
| 1966 |
ELIZA chatbot created by Joseph Weizenbaum |
| 1986 |
Rumelhart et al. popularize backpropagation, advancing neural networks |
| 1997 |
IBM Deep Blue defeats world chess champion Garry Kasparov |
| 2012 |
Deep Learning achieves breakthrough at ImageNet (AlexNet), sparking new AI boom |
| 2013 |
Word2Vec introduces efficient word embeddings |
| 2017 |
Transformer architecture proposed (“Attention Is All You Need”) |
| 2018 |
BERT and GPT-1 released, launching the pre-trained model era |
| 2022 |
ChatGPT released, bringing LLMs into the mainstream |
| 2023+ |
Multimodal models (GPT-4V, Sora, etc.) blur the lines between domains |
Summary
| Term |
What It Is |
Analogy |
| AI |
The destination (make machines “smart”) |
“Cooking” as a concept |
| ML |
An important road to the destination (learn from data) |
Cooking techniques (stir-fry, steam, bake) |
| NLP / CV / Speech |
Specific terrain you’re navigating (type of data) |
Cuisines (Sichuan, Cantonese, Fusion) |
| Text-to-Image, Translation, etc. |
The specific dish you’re making |
Individual dishes |
| Deep Learning / RL |
Powerful tools you use along the way |
A pressure cooker — works with any technique or cuisine |
Modern Large Language Models (LLMs) are essentially the product of NLP + Deep Learning (a subset of ML) — a deep convergence of AI’s goal, ML’s methodology, and NLP’s domain expertise.
References