History Of Natural Language Processing

  • ANCIENT HISTORY TO 1700S: FOUNDATIONS OF LINGUISTICS

    Long before the age of computers, the study of language and linguistics was fundamental to the development of NLP. Scholars like Panini in ancient India, who composed the grammar of Sanskrit, laid foundational aspects of structural linguistics that would later inspire computational techniques.
    Book: The Ashtadhyayi Of Panini
  • 1800S - EARLY 1900S: MATHEMATICAL AND PHILOSOPHICAL FOUNDATIONS

    The formal study of logic, mathematics, and philosophy, especially the work of George Boole and his Boolean algebra, laid the groundwork for the symbolic representation of logical statements. This made it feasible to envision computation with language.
    Book : An Investigation of THE LAWS OF THOUGHT by George Boole
  • 1930s: TURING'S UNIVERSAL MACHINE

    Alan Turing introduced the concept of a universal machine, later known as the Turing machine. This theoretical construct formed the basis for how we understand computation and algorithms today. Turing's work was foundational not just for computer science but also for imagining the possibility of machines processing language.
    Paper: On Computable Numbers, with an Application to the Entscheidungsproblem by Alan Turing [1936]
  • 1940s: FIRST ATTEMPTS AT MACHINE TRANSLATION

    In the wake of World War II, the idea of automatic translation between languages (particularly Russian to English) became of significant interest in the United States. Early experiments in this decade laid the groundwork for what would later become the field of machine translation.

  • 1945: VANNEVAR BUSH'S "MEMEX"

    Vannevar Bush proposed a hypothetical device called "Memex" in his essay "As We May Think". This machine would allow users to create links between documents, much like hyperlinks on the web today, anticipating the future of information retrieval and the importance of context in understanding language.
    Resource: As We May Think by Vannevar Bush
    
    Book: The Essential Writings of Vannevar Bush
  • 1946: ENIAC – ELECTRONIC NUMERICAL INTEGRATOR AND COMPUTER

    The ENIAC was the first general-purpose electronic digital computer. While it was not used for NLP tasks, its existence and the developments surrounding it paved the way for the feasibility of computational tasks, including those related to language.
    Book: ENIAC in Action: Making and Remaking the Modern Computer
    
    Wiki: ENIAC
  • 1950: TURING TEST

    Alan Turing proposed a measure of machine intelligence where a machine is said to exhibit human-like intelligence if it can imitate human responses under specific conditions. This test became a foundational concept for artificial intelligence and, by extension, natural language processing.
    Paper: Computing Machinery and Intelligence
    
    Book: The Turing Test: Verbal Behavior as the Hallmark of Intelligence
  • 1951: UNIVAC I – THE FIRST COMMERCIAL COMPUTER

    The Universal Automatic Computer I (UNIVAC I) was the first commercially available computer. It marked the beginning of a new era where computing resources would be more widely accessible, opening the door for advancements in multiple fields, including natural language processing.
    Wiki: UNIVAC I
  • 1954: GEORGETOWN-IBM EXPERIMENT

    This was an early demonstration of a machine translation system, where more than 60 Russian sentences were automatically translated into English. Sponsored by the U.S. government, it was one of the first public displays of machine translation capabilities.
    Research Paper: The first public demonstration of machine translation: the Georgetown-IBM system
  • 1956: GRAMMARS AND CHOMSKY'S HIERARCHY

    Noam Chomsky, in his work "Three Models for the Description of Language", introduced different types of grammars (Type 0, Type 1, Type 2, Type 3) which later came to be known as Chomsky's Hierarchy. These grammars provided a structured way to think about languages and their complexity, greatly influencing NLP.
    Research Paper: Three Models For The Description Of Language
  • 1957: THE PERCEPTRON

    Frank Rosenblatt introduced the Perceptron, an algorithm for supervised learning of binary classifiers. Though initially not applied directly to NLP, the ideas from the Perceptron would later influence models in NLP, especially with the rise of neural networks.
    Book: The Perceptron[Cornell]
  • 1960: ALGOL 60 AND FORMAL GRAMMARS

    The ALGOL 60 report introduced the use of BNF (Backus-Naur Form), a notation to describe formal grammars. Though intended for programming languages, the use of BNF laid foundational ideas for formalizing natural language syntax in NLP.
    Report: Report on the Algorithmic Language ALGOL 60
  • 1961: BASEBALL

    Joseph Weizenbaum at MIT developed the "BASEBALL" program, one of the first efforts at information retrieval that could answer questions about a database of baseball statistics. It was an early demonstration of the potential for computers to understand and respond to user queries about a specific domain. Mention about this is mentioned in the ELIZA paper which was published in 1966.

  • 1962: SHRDLU

    Terry Winograd created SHRDLU, an early natural language processing computer program. It could understand and execute instructions given in a natural language concerning a virtual world of toy blocks. This system combined both understanding and generating of natural language with a reasoning component.
    Book: Language As a Cognitive Process: Syntax
  • 1964: SYNTACTIC PATTERN RECOGNITION

    Rohit Parikh proposed the application of syntactic (or structural) pattern recognition to the problem of optical character recognition, setting the stage for applications of formal grammar and automata theory in pattern recognition, which would later be applied in NLP tasks.
    Wiki: Parikh's Theorem
  • 1964: STUDENT, DEFDEF, AND SIR

    Daniel Bobrow's doctoral dissertation at MIT introduced the program called "STUDENT", which could solve algebra word problems. This effort was followed by the development of two more advanced systems: DEFDEF (a definition-based question-answering system) and SIR (Semantic Information Retrieval), though these might have been developed or refined in the years that followed.
    Research Paper: Natural Language Input For A Computer Problem Solving System
    Book: Natural Language Input For A Computer Problem Solving System
  • 1964: BROWN CORPUS RELEASE

    The Brown University Standard Corpus of Present-Day American English, commonly referred to as the Brown Corpus, was a landmark in NLP. Compiled in the 1960s, it was fully released in 1964. This corpus containing a million English words became a foundation for statistical NLP methods.
    Manual: Brown Corpus Manual
  • 1965: SEMANTIC NETWORKS

    Ross Quillian proposed the concept of semantic networks, a graphical notation for representing knowledge in patterns of interconnected nodes and arcs. These networks became foundational for knowledge representation in artificial intelligence and NLP.
    Book: Quillian, M. Ross. (1966). Semantic Memory. Semantic Information Processing.
    Book: Quillian, M. Ross. (1969). The Teachable Language Comprehender: A Simulation Program and Theory of Language. Communications of the ACM.
  • 1964-1966: ELIZA

    Joseph Weizenbaum developed "ELIZA" in the mid-60s at MIT, a computer program designed to emulate a Rogerian psychotherapist. Using pattern matching and substitution methodologies, ELIZA provided a semblance of understanding, marking one of the earliest attempts at creating a chatbot.
    Research Paper: Computational Linguistics
  • 1966: ALPAC REPORT

    The Automatic Language Processing Advisory Committee (ALPAC) was formed by the U.S. government to evaluate the state and progress of machine translation research. The committee's report was largely negative and led to a significant reduction in funding for machine translation research in the U.S.
    Report: ALPAC: the (in)famous report
    Wiki: ALPAC
  • 1969: DENDRAL

    Edward Feigenbaum, Joshua Lederberg, and Bruce Buchanan at Stanford University developed DENDRAL, which was the first expert system. While it was initially designed for applications in organic chemistry, to infer molecular structure from information provided by a mass spectrometer, its approach and techniques became influential in AI and knowledge representation. It's not strictly NLP but is very much related in the broader context of AI's evolution.
    Paper: DENDRAL: A Case Study Of The First Expert System For Scientific Hypothesis Formation
  • 1970: AUGMENTED TRANSITION NETWORKS (ATNs)

    William Woods introduced Augmented Transition Networks (ATNs) as a form of grammar for natural language processing. ATNs were powerful in representing semantic information and became widely used in early NLP systems.
    Research Paper: Transition Network Grammars for Natural Language Analysis
  • 1970's: MORPHOLOGICAL PROCESSING

    Morphological processing pertains to the structure and formation of words. Martin Kay played a pivotal role in the early '70s by developing advanced algorithms for morphological analysis. His work laid the groundwork for systems to recognize and process different word forms, enhancing the efficiency of language models and their understanding of word variations.

  • 1972: LUNAR SYSTEM

    The LUNAR system, crafted by Roger Schank and his students at Yale, was among the initial natural language interfaces designed for databases. This was groundbreaking because it allowed scientists to query a database on moon rock samples using plain English, showcasing the potential of NLP in practical applications.
    Paper: The Lunar Science Natural Language Information System: Final Report
  • 1973: SCRIPT THEORY

    Roger Schank and Robert Abelson's "script" concept provided a structured representation of sequences in stories. This was integral for machines to comprehend narratives. By identifying and categorizing common sequences in stories, NLP systems could predict and understand a range of scenarios and contexts within human narratives.
    Book: Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures
    
    Paper: Scripts, Plans, and Knowledge
  • 1973: WINOGRAD’S SHRDLU COMPLETED

    Terry Winograd's SHRDLU system, completed in 1973, could comprehend and formulate English sentences in a simulated "blocks world". This achievement demonstrated how computers could understand and interact using natural language in a confined environment, marking an early success in the journey towards conversational AI.
    Paper: SHRDLU
  • 1975: CONCEPTUAL DEPENDENCY THEORY

    Roger Schank introduced Conceptual Dependency Theory for natural language understanding in his book "Conceptual Information Processing". This theory proposed that there are a limited number of primitive actions that can represent all possible actions in terms of their conceptual meaning.
    Book: Conceptual Information Processing by R. C. Schank
  • 1975: LEXICAL FUNCTIONAL GRAMMAR

    Ron Kaplan and Joan Bresnan introduced "Lexical Functional Grammar", emphasizing the importance of lexicon and syntactic structures in language processing. This shift in perspective from traditional grammar theories made it more amenable for computational implementations, influencing future NLP systems.
    Paper: Lexical-Functional Grammar: A Formal System for Grammatical Representation
    
    Book: Lexical-Functional Grammar: An Introduction
  • 1977: GUS: A FRAME-DRIVEN DIALOG SYSTEM

    Co-authored by Martin Kay, the GUS system was an early example of a frame-driven dialogue system. It marked an attempt to understand natural language input by a computer. The framework-based approach to dialogue was a notable development in the era, showing how structured data representations could be employed to manage and comprehend user input in natural language processing tasks.
    Paper: GUS, A Frame-Driven Dialog System
  • 1977: CHAT-80

    CHAT-80, developed by Warren and Pereira, could interpret questions about world geography, process them in Prolog, and revert with answers in English. This innovative system showcased the potential of NLP to be integrated with database management, providing natural language interfaces for users.
    Paper: An Efficient Easily Adaptable System for Interpreting Natural Language Queries
  • 1979: FUNCTIONAL GRAMMAR

    Martin Kay's work on functional grammar provided insights into how linguistic structures can be analyzed and represented. This work laid foundations for subsequent studies in functional approaches to grammar, emphasizing the roles that different elements of sentences play in conveying meaning.
    Paper: Functional Grammar
  • 1980's: FRAME-BASED SYSTEMS FOR NLP

    In the early 1980s, there was a move towards frame-based systems in NLP, which represent stereotyped situations. Minsky's "Frames" paper, although written in the 70s, had a substantial influence on this trend.
    Paper: A Framework For Representing Knowledge by Marvin Minsky [1974]
  • 1980: MORPHOLOGICAL PROCESSING

    During the early 1980s, there was significant progress in morphological processing, which deals with understanding the structure of words. This was essential in the development of early NLP systems that aimed to understand and generate human language.
    Book: Finite State Morphology
  • 1986: CONNECTIONIST APPROACHES TO LANGUAGE

    This period saw a surge in the popularity of connectionist models, also known as neural networks, for NLP tasks. These models were different from symbolic AI models that were popular at the time. Rumelhart, Hinton, and Williams' work on backpropagation was influential in popularizing these models.
    Paper: Learning Representation by Back-Propogating Errors
  • 1990: HIDDEN MARKOV MODELS FOR SPEECH RECOGNITION

    By the late 80s and early 90s, Hidden Markov Models (HMMs) became the dominant approach for speech recognition, revolutionizing the field. These probabilistic models provided a framework for handling uncertainty in spoken language.
    Book: Fundamentals of Speech Recognition
  • 1990: IBM's STATISTICAL APPROACH TO MACHINE TRANSLATION

    In 1990, IBM researchers introduced a statistical approach to machine translation, marking a significant departure from the previously dominant rule-based methods. This method laid the foundation for subsequent statistical machine translation techniques.
    Paper: A Statistical Approach to Machine Translation
  • 1993: DATA-DRIVEN APPROACHES & THE PENN TREEBANK

    In 1993, researchers from the University of Pennsylvania introduced the Penn Treebank project. This substantial effort aimed to construct a large corpus in which sentences were annotated with syntactic and part-of-speech information. The availability of this annotated data empowered machine learning models in NLP, promoting data-driven approaches and facilitating algorithm training on genuine linguistic data.
    Paper: Building a Large Annotated Corpus of English: The Penn Treebank 
  • 1994: DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION

    In 1994, David Yarowsky introduced a statistical decision procedure for lexical ambiguity resolution. The approach utilized both local syntactic patterns and distant collocational evidence, offering an effective method for addressing ambiguities such as restoring missing accents in Spanish and French text.
    Paper: Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
  • 1994: MACHINE LEARNING FOR NLP: MEMORY-BASED LEARNING

    Memory-based learning, an instance-based methodology, relies on decisions made from stored training examples rather than following abstract rules. Walter Daelemans and his team highlighted its potential for NLP tasks in 1994. This represented an integration of machine learning techniques into linguistic processes, showcasing the increasing synergy between machine learning and NLP.
    Paper: Memory-Based Lexical Acquisition and Processing 
  • 1995: CENTERING THEORY

    Barbara Grosz, Aravind Joshi, and Scott Weinstein formally introduced the Centering Theory in a 1995 paper. It aimed to model the local coherence of discourse, focusing on relationships among attentional focus, choice of referring expressions, and perceived coherence of utterances within a discourse segment. The ideas had been in circulation and development since the early '80s.
    Paper: Centering: A Framework for Modelling the Local Coherence of Disclosure
  • 1996: PROBABILISTIC MODELS & MAXIMUM ENTROPY

    The mid-1990s marked a significant shift towards probabilistic models in NLP. A landmark in this transition was the adoption of the Maximum Entropy principle, which optimizes model parameters based on the maximum likelihood while respecting constraints from the data. In their 1996 paper, Berger et al. showcased the versatility and efficiency of Maximum Entropy by applying it to various NLP tasks.
    Paper: A Maximum Entropy Approach to Natural Language Processing 
  • 1997: STATISTICAL PARSING

    By 1997, the application of statistical methods was revolutionizing syntactic analysis. Michael Collins introduced generative, lexicalized models for statistical parsing, utilizing data-driven methods that markedly improved accuracy rates in understanding the structure of sentences. His work set new standards for how syntactic parsing was approached.
    Paper: Three Generative, Lexicalised Models for Statistical Parsing
  • 2000: MAXIMUM ENTROPY MARKOV MODELS (MEMMs) FOR SEQUENCE MODELING

    In 2000, McCallum and colleagues introduced MEMMs, a method that combined the Maximum Entropy principle with sequence modeling. MEMMs represented a leap in sequence tagging tasks, such as part-of-speech tagging, by effectively incorporating both past context and rich feature representations.
    Paper: Maximum Entropy Markov Models for Information Extraction and Segmentation
  • 2001: NAMED ENTITY RECOGNITION USING LINEAR CHAIN CRF

    Conditional Random Fields (CRFs) were introduced as a robust statistical framework to sequence labeling problems, addressing issues present in prior methods like MEMMs. John Lafferty and his team illustrated the capability of CRFs for named entity recognition, enhancing the precision of extracting names, places, and other specific entities from text.
    Paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
  • 2001: CO-TRAINING METHODS FOR STATISTICAL PARSING

    Anoop Sarkar introduced a Co-Training method for statistical parsing that utilized both labeled and unlabeled data. His approach combined a small corpus annotated with parse trees and a vast pool of unlabeled text. The iterative labeling process demonstrated that training a statistical parser using both labeled and unlabeled data significantly outperformed using only labeled data, pushing forward the importance of semi-supervised learning techniques in NLP.
    Paper: Applying Co-Training methods to Statistical Parsing
  • 2002: MAXIMUM ENTROPY PARSING

    Ratnaparkhi's work took a pioneering step by integrating the Maximum Entropy model into syntactic parsing. This approach blended rich feature representation with data-driven modeling, significantly enhancing the accuracy of automated sentence structure analysis.
    Paper: A Maximum Entropy Model for Part-Of-Speech Tagging 
  • 2003: LATENT SEMANTIC ANALYSIS AND TOPIC MODELING

    Latent Dirichlet Allocation (LDA) presented by Blei and colleagues opened new avenues in extracting topics from vast text collections. By allowing texts to exhibit multiple topics, LDA brought about a more nuanced and granular understanding of content, changing the landscape of topic modeling.
    Paper: Latent Dirichlet Allocation
  • 2003: NEURAL NETWORK LANGUAGE MODELS

    Bengio's team showcased the potential of neural networks in language modeling. This foundational work signified a departure from traditional n-gram based models, highlighting neural networks' ability to capture intricate language patterns, marking an essential step towards the deep learning era in NLP.
    Paper: A Neural Probabilistic Language Model
  • 2004: SEQUENCE LABELING WITH SEMI-MARKOV CRFs

    Sarawagi and Cohen enhanced the CRFs framework by introducing a semi-Markov variant. This version was adept at modeling variable-length sequence segments, refining the precision of tasks like information extraction and entity recognition.
    Paper: Semi-Markov Conditional Random Fields for Information Extraction
  • 2005: SEMANTIC ROLE LABELING

    Palmer and her team emphasized Semantic Role Labeling (SRL), a task that detects the semantic roles within a sentence. By shedding light on the relationships between verbs and their arguments, SRL enriched the depth of semantic text analysis, paving the way for better text understanding.
    Paper: The Proposition Bank: An Annotated Corpus of Semantic Roles
  • 2006: INITIATION OF DEEP LEARNING WITH NEURAL NETWORKS

    Hinton and Salakhutdinov introduced an effective method of initializing weights for deep autoencoder networks, allowing them to learn low-dimensional codes that significantly outperformed traditional methods like principal components analysis. This pioneering work laid down foundational principles for deep learning, which would later be embraced extensively in NLP and other fields.
    Paper: Reducing the Dimensionality of Data with Neural Networks
  • 2008: SEMI-SUPERVISED LEARNING IN NLP

    Suzuki and Isozaki demonstrated the potential of combining modest-sized labeled datasets with enormous unlabeled datasets in several NLP tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition. Their semi-supervised method not only capitalized on the availability of large amounts of unlabeled data but also set new performance standards on widely-used benchmarks.
    Paper: Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data
  • 2010: RECURRENT NEURAL NETWORKS FOR LANGUAGE MODELING

    Mikolov's introduction of recurrent neural networks (RNNs) for language modeling marked a major stride in handling sequences in NLP. By allowing models to remember past information, RNNs became the cornerstone for developments like sequence-to-sequence models, revolutionizing tasks like translation and text generation.
    Paper: Recurrent neural network based language model
  • 2012: DEEP LEARNING FOR SEQUENCE MODELING

    Before deep learning, sequence modeling, especially handwriting recognition, was a challenging endeavor. Alex Graves and team demonstrated that recurrent neural networks (RNNs), especially the LSTM variant, can effectively model complex sequences. This discovery provided momentum to LSTMs' adoption in various NLP tasks, from text generation to sentiment classification.
    Paper: Sequence Transduction with Recurrent Neural Networks
  • 2013: WORD EMBEDDINGS AND DISTRIBUTED REPRESENTATIONS

    Tomas Mikolov and his team at Google revolutionized semantic understanding in NLP with the introduction of word2vec. This method represented words in high-dimensional vector spaces, enabling algorithms to discern semantic relationships between words based purely on their positions in this space. It changed the landscape of NLP, leading to breakthroughs in numerous applications such as machine translation, sentiment analysis, and information retrieval.
    Paper: Efficient Estimation of Word Representations in Vector Space
  • 2014: SEQUENCE-TO-SEQUENCE LEARNING

    The sequence-to-sequence model, presented by Sutskever and team, provided a novel way of managing tasks that deal with input and output sequences, like translating a sentence from one language to another. By using two LSTMs, one for encoding the input and the other for decoding into the output, this model set the standard for a range of applications including machine translation and automated summarization.
    Paper: Sequence to Sequence Learning with Neural Networks
  • 2015: SKIP-THOUGHT VECTORS

    Building on the success of word embeddings, Kiros and colleagues introduced an approach to generate sentence embeddings. Named skip-thought vectors, this technique represented entire sentences in vector spaces, offering a way to measure semantic similarity between sentences and thus greatly improving document-level understanding and tasks like text classification and clustering.
    Paper: Skip-Thought Vectors
  • 2015: THE RISE OF CHATBOTS

    As the digital world continued to grow, there was a growing demand for sophisticated chatbots. Vinyals and Le at Google took on this challenge, showcasing a model that employed deep learning, specifically sequence-to-sequence LSTMs, to handle conversational data. Their work provided the foundation for the next generation of chatbots, dialogue systems, and other conversational AI applications.
    Paper: A Neural Conversational Model
  • 2017: TRANSFORMERS AND ATTENTION MECHANISMS

    The Transformer architecture, proposed by Vaswani and colleagues at Google, was a turning point in deep learning for NLP. By emphasizing the importance of attention mechanisms, which weigh input features differently, the Transformer model sidestepped the limitations of sequence-based approaches like RNNs. Its flexibility and efficiency became the foundation for several influential models like BERT, GPT, and more.
    Paper: Attention Is All You Need
  • 2018: GPT: GENERATIVE PRE-TRAINED TRANSFORMER

    OpenAI's GPT showcased the potential of the Transformer architecture in generating coherent and contextually relevant text passages. As a language model, GPT was not just an evolution in scale but also in capability, effectively generating text that was often indistinguishable from human writing in various tasks.
    Paper: Improving Language Understanding by Generative Pre-Training
  • 2019: NEURAL MACHINE TRANSLATION MILESTONE

    The quest for effective machine translation saw a significant advancement with Facebook AI's LASER. By offering multilingual sentence embeddings, LASER facilitated zero-shot cross-lingual transfer, allowing for translations even in language pairs with limited direct translation data.
    Paper: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
  • 2020: GPT-3 AND FEW-SHOT LEARNING

    OpenAI unveiled GPT-3, setting a new benchmark in the NLP realm. With a staggering 175 billion parameters, GPT-3 showcased the ability to perform tasks using few-shot learning, demonstrating tasks with minimal instruction and without needing explicit task-specific training data, highlighting the model's versatility and adaptability.
    Paper: Language Models are Few-Shot Learners

© Let’s Data Science

LOGIN

Unlock AI & Data Science treasures. Log in!