AI, Machine Learning, and NLP: Origins, Relationships, and Taxonomy

Introduction

The terms Artificial Intelligence (AI), Machine Learning (ML), and Natural Language Processing (NLP) are often used interchangeably in casual conversation, but they refer to very different things. Understanding their origins and how they relate to each other is essential for anyone working in the field. This post traces the history of each term, clarifies the relationships between them, and provides a clean taxonomy of methodologies vs. domains vs. tasks.

1. When Were These Terms Coined?

Artificial Intelligence (AI)

Coined by: John McCarthy
Year: 1956
Context: At the famous Dartmouth Summer Research Project on Artificial Intelligence held at Dartmouth College, USA. McCarthy, along with Marvin Minsky, Nathaniel Rochester, and Claude Shannon, organized this workshop, which is widely regarded as the birth of AI as a formal academic discipline.

Machine Learning (ML)

Coined by: Arthur Samuel
Year: 1959
Context: While working at IBM, Samuel developed a self-learning checkers program. He defined Machine Learning as “the field of study that gives computers the ability to learn without being explicitly programmed.”

Natural Language Processing (NLP)

Unlike AI and ML, NLP does not have a single definitive originator or moment of creation. It evolved gradually:

Year	Milestone	Key Figures
1950	Alan Turing published “Computing Machinery and Intelligence”, proposing the Turing Test — testing machine intelligence through natural language conversation	Alan Turing
1954	Georgetown-IBM Experiment — the first public demonstration of machine translation (translating 60+ Russian sentences into English automatically). Widely considered the starting point of NLP research	Leon Dostert, Paul Garvin (Georgetown University & IBM)
1957	Noam Chomsky published “Syntactic Structures”, introducing generative grammar theory that laid the linguistic foundation for computational language understanding	Noam Chomsky
1960s	The term “Natural Language Processing” began appearing widely in academic literature, though no single person is credited with coining it	—
1966	ELIZA, one of the earliest chatbots, was created at MIT	Joseph Weizenbaum

Summary: If a single starting point must be chosen, the 1954 Georgetown-IBM machine translation experiment is the consensus origin of NLP research. Noam Chomsky’s formal language theory provided its critical theoretical foundation.

2. How Do AI, ML, and NLP Relate to Each Other?

The Big Picture

Artificial Intelligence (AI) ─── The overarching goal: make machines intelligent
  │
  ├── Machine Learning (ML) ─── A methodology: learn from data
  │     ├── Deep Learning (DL)
  │     └── Traditional ML (SVM, Decision Trees, etc.)
  │
  └── Natural Language Processing (NLP) ─── An application domain: understand & generate human language
        ├── Rule-based approaches (1950s–1980s)
        ├── Statistical / ML-based approaches (1990s–2012)
        └── Deep Learning-based approaches (2013–present)

Key Distinctions

Concept	Nature	Analogy
AI	A broad goal (make machines intelligent like humans)	“Medicine” as a discipline
ML	A method/means to achieve AI (learn from data)	“Drug therapy” as a treatment method
NLP	An application domain of AI (process human language)	“Cardiology” as a specialty

The Intersection

NLP is a sub-field of AI: NLP’s goal (making machines understand language) is part of AI’s overarching objective.
ML is the primary method used in NLP: Modern NLP relies almost entirely on ML (especially Deep Learning).
NLP ≠ ML: Early NLP was predominantly based on hand-crafted rules and linguistic knowledge, not ML at all.

       ┌─────────────────────────────────┐
       │         AI (Artificial           │
       │         Intelligence)            │
       │                                  │
       │   ┌──────────┐  ┌──────────┐    │
       │   │    ML    │  │   NLP    │    │
       │   │ (Method) │  │ (Domain) │    │
       │   │          │  │          │    │
       │   │    ┌─────┼──┼────┐     │    │
       │   │    │ Modern NLP  │     │    │
       │   │    │ (ML-powered)│     │    │
       │   │    └─────┼──┼────┘     │    │
       │   └──────────┘  └──────────┘    │
       └─────────────────────────────────┘

3. Methodology vs. Domain vs. Task — A Clean Taxonomy

A common source of confusion is mixing up methods, domains, and tasks. Here is how to think about them:

Category	Question It Answers	Examples
Methodology (How)	How does the machine learn?	Deep Learning, Reinforcement Learning, Supervised Learning, Unsupervised Learning, Transfer Learning
Domain (What)	What type of data/problem are we dealing with?	NLP, Computer Vision (CV), Speech Processing, Multimodal AI, Robotics
Task (Specific What)	What specific job are we doing?	Image Recognition, Text-to-Image, Text-to-Video, Text-to-Speech, Machine Translation, Q\&A

Full Hierarchy

AI (Artificial Intelligence)
│
├── Methodologies (How to learn)
│   ├── Machine Learning (ML)
│   │   ├── Supervised Learning
│   │   ├── Unsupervised Learning
│   │   ├── Reinforcement Learning        ← Method
│   │   └── Deep Learning                 ← Method (subset of ML)
│   │       ├── CNN, RNN, Transformer ...
│   │       └── (Can combine with any learning paradigm above)
│   └── Non-ML approaches (rule-based systems, search algorithms, knowledge graphs, etc.)
│
├── Domains (What type of data)
│   ├── Natural Language Processing (NLP)    — Text
│   ├── Computer Vision (CV)                 — Images / Video
│   ├── Speech Processing                    — Audio / Voice
│   ├── Multimodal AI                        — Cross-modal
│   └── Robotics                             — Physical interaction
│
└── Tasks (Specific applications under domains)
    ├── Image Recognition / Classification   → CV
    ├── Text-to-Image (e.g., DALL·E, SD)    → Multimodal (NLP + CV)
    ├── Text-to-Video (e.g., Sora)          → Multimodal (NLP + CV)
    ├── Text-to-Speech (TTS)                → Multimodal (NLP + Speech)
    ├── Machine Translation                  → NLP
    └── Dialogue / Q&A                       → NLP

Key Nuances

Concept	Clarification
Deep Learning	Strictly a subset of ML (methods based on deep neural networks). It can be combined with supervised, unsupervised, or reinforcement learning.
Reinforcement Learning	A learning paradigm (learning through interaction with an environment via reward/penalty feedback). It can be implemented with or without deep learning (Deep RL vs. tabular RL).
Text-to-Image / Text-to-Video	These are specific tasks, one level more granular than “domain”. They belong to the Multimodal or Generative AI domain.

4. Three Eras of NLP

Era	Period	Core Approach	Notable Examples
Rule-based	1950s–1980s	Hand-written grammar rules, expert systems	ELIZA, SHRDLU
Statistical / ML	1990s–2012	Statistical models, traditional machine learning	HMM, CRF, SVM, TF-IDF
Deep Learning	2013–present	Neural networks, pre-trained large models	Word2Vec (2013), Transformer (2017), BERT (2018), GPT series (2018–), ChatGPT (2022)

5. Timeline of Key Milestones

Year	Event
1950	Alan Turing publishes “Computing Machinery and Intelligence”, proposes the Turing Test
1954	Georgetown-IBM experiment — first machine translation demo (origin of NLP)
1956	John McCarthy coins “Artificial Intelligence” at the Dartmouth Conference
1957	Noam Chomsky publishes “Syntactic Structures”
1959	Arthur Samuel coins “Machine Learning”
1966	ELIZA chatbot created by Joseph Weizenbaum
1986	Rumelhart et al. popularize backpropagation, advancing neural networks
1997	IBM Deep Blue defeats world chess champion Garry Kasparov
2012	Deep Learning achieves breakthrough at ImageNet (AlexNet), sparking new AI boom
2013	Word2Vec introduces efficient word embeddings
2017	Transformer architecture proposed (“Attention Is All You Need”)
2018	BERT and GPT-1 released, launching the pre-trained model era
2022	ChatGPT released, bringing LLMs into the mainstream
2023+	Multimodal models (GPT-4V, Sora, etc.) blur the lines between domains

Summary

Term	What It Is	Analogy
AI	The destination (make machines “smart”)	“Cooking” as a concept
ML	An important road to the destination (learn from data)	Cooking techniques (stir-fry, steam, bake)
NLP / CV / Speech	Specific terrain you’re navigating (type of data)	Cuisines (Sichuan, Cantonese, Fusion)
Text-to-Image, Translation, etc.	The specific dish you’re making	Individual dishes
Deep Learning / RL	Powerful tools you use along the way	A pressure cooker — works with any technique or cuisine

Modern Large Language Models (LLMs) are essentially the product of NLP + Deep Learning (a subset of ML) — a deep convergence of AI’s goal, ML’s methodology, and NLP’s domain expertise.