History Of Natural Language Processing

ANCIENT HISTORY TO 1700S: FOUNDATIONS OF LINGUISTICS

Long before the age of computers, the study of language and linguistics was fundamental to the development of NLP. Scholars like Panini in ancient India, who composed the grammar of Sanskrit, laid foundational aspects of structural linguistics that would later inspire computational techniques.
Book: The Ashtadhyayi Of Panini
1800S - EARLY 1900S: MATHEMATICAL AND PHILOSOPHICAL FOUNDATIONS

The formal study of logic, mathematics, and philosophy, especially the work of George Boole and his Boolean algebra, laid the groundwork for the symbolic representation of logical statements. This made it feasible to envision computation with language.
Book : An Investigation of THE LAWS OF THOUGHT by George Boole
1930s: TURING'S UNIVERSAL MACHINE

Alan Turing introduced the concept of a universal machine, later known as the Turing machine. This theoretical construct formed the basis for how we understand computation and algorithms today. Turing's work was foundational not just for computer science but also for imagining the possibility of machines processing language.
Paper: On Computable Numbers, with an Application to the Entscheidungsproblem by Alan Turing [1936]
1940s: FIRST ATTEMPTS AT MACHINE TRANSLATION

In the wake of World War II, the idea of automatic translation between languages (particularly Russian to English) became of significant interest in the United States. Early experiments in this decade laid the groundwork for what would later become the field of machine translation.
1945: VANNEVAR BUSH'S "MEMEX"

Vannevar Bush proposed a hypothetical device called "Memex" in his essay "As We May Think". This machine would allow users to create links between documents, much like hyperlinks on the web today, anticipating the future of information retrieval and the importance of context in understanding language.
Resource: As We May Think by Vannevar Bush
Book: The Essential Writings of Vannevar Bush
1946: ENIAC – ELECTRONIC NUMERICAL INTEGRATOR AND COMPUTER

The ENIAC was the first general-purpose electronic digital computer. While it was not used for NLP tasks, its existence and the developments surrounding it paved the way for the feasibility of computational tasks, including those related to language.
Book: ENIAC in Action: Making and Remaking the Modern Computer
Wiki: ENIAC
1950: TURING TEST

Alan Turing proposed a measure of machine intelligence where a machine is said to exhibit human-like intelligence if it can imitate human responses under specific conditions. This test became a foundational concept for artificial intelligence and, by extension, natural language processing.
Paper: Computing Machinery and Intelligence
Book: The Turing Test: Verbal Behavior as the Hallmark of Intelligence
1951: UNIVAC I – THE FIRST COMMERCIAL COMPUTER

The Universal Automatic Computer I (UNIVAC I) was the first commercially available computer. It marked the beginning of a new era where computing resources would be more widely accessible, opening the door for advancements in multiple fields, including natural language processing.
Wiki: UNIVAC I
1954: GEORGETOWN-IBM EXPERIMENT

This was an early demonstration of a machine translation system, where more than 60 Russian sentences were automatically translated into English. Sponsored by the U.S. government, it was one of the first public displays of machine translation capabilities.
Research Paper: The first public demonstration of machine translation: the Georgetown-IBM system
1956: GRAMMARS AND CHOMSKY'S HIERARCHY

Noam Chomsky, in his work "Three Models for the Description of Language", introduced different types of grammars (Type 0, Type 1, Type 2, Type 3) which later came to be known as Chomsky's Hierarchy. These grammars provided a structured way to think about languages and their complexity, greatly influencing NLP.
Research Paper: Three Models For The Description Of Language
1957: THE PERCEPTRON

Frank Rosenblatt introduced the Perceptron, an algorithm for supervised learning of binary classifiers. Though initially not applied directly to NLP, the ideas from the Perceptron would later influence models in NLP, especially with the rise of neural networks.
Book: The Perceptron[Cornell]
1960: ALGOL 60 AND FORMAL GRAMMARS

The ALGOL 60 report introduced the use of BNF (Backus-Naur Form), a notation to describe formal grammars. Though intended for programming languages, the use of BNF laid foundational ideas for formalizing natural language syntax in NLP.
Report: Report on the Algorithmic Language ALGOL 60
1961: BASEBALL

Joseph Weizenbaum at MIT developed the "BASEBALL" program, one of the first efforts at information retrieval that could answer questions about a database of baseball statistics. It was an early demonstration of the potential for computers to understand and respond to user queries about a specific domain. Mention about this is mentioned in the ELIZA paper which was published in 1966.
1962: SHRDLU

Terry Winograd created SHRDLU, an early natural language processing computer program. It could understand and execute instructions given in a natural language concerning a virtual world of toy blocks. This system combined both understanding and generating of natural language with a reasoning component.
Book: Language As a Cognitive Process: Syntax
1964: SYNTACTIC PATTERN RECOGNITION

Rohit Parikh proposed the application of syntactic (or structural) pattern recognition to the problem of optical character recognition, setting the stage for applications of formal grammar and automata theory in pattern recognition, which would later be applied in NLP tasks.
Wiki: Parikh's Theorem
1964: STUDENT, DEFDEF, AND SIR

Daniel Bobrow's doctoral dissertation at MIT introduced the program called "STUDENT", which could solve algebra word problems. This effort was followed by the development of two more advanced systems: DEFDEF (a definition-based question-answering system) and SIR (Semantic Information Retrieval), though these might have been developed or refined in the years that followed.
Research Paper: Natural Language Input For A Computer Problem Solving System Book: Natural Language Input For A Computer Problem Solving System
1964: BROWN CORPUS RELEASE

The Brown University Standard Corpus of Present-Day American English, commonly referred to as the Brown Corpus, was a landmark in NLP. Compiled in the 1960s, it was fully released in 1964. This corpus containing a million English words became a foundation for statistical NLP methods.
Manual: Brown Corpus Manual
1965: SEMANTIC NETWORKS

Ross Quillian proposed the concept of semantic networks, a graphical notation for representing knowledge in patterns of interconnected nodes and arcs. These networks became foundational for knowledge representation in artificial intelligence and NLP.
Book: Quillian, M. Ross. (1966). Semantic Memory. Semantic Information Processing. Book: Quillian, M. Ross. (1969). The Teachable Language Comprehender: A Simulation Program and Theory of Language. Communications of the ACM.
1964-1966: ELIZA

Joseph Weizenbaum developed "ELIZA" in the mid-60s at MIT, a computer program designed to emulate a Rogerian psychotherapist. Using pattern matching and substitution methodologies, ELIZA provided a semblance of understanding, marking one of the earliest attempts at creating a chatbot.
Research Paper: Computational Linguistics
1966: ALPAC REPORT

The Automatic Language Processing Advisory Committee (ALPAC) was formed by the U.S. government to evaluate the state and progress of machine translation research. The committee's report was largely negative and led to a significant reduction in funding for machine translation research in the U.S.
Report: ALPAC: the (in)famous report Wiki: ALPAC
1969: DENDRAL

Edward Feigenbaum, Joshua Lederberg, and Bruce Buchanan at Stanford University developed DENDRAL, which was the first expert system. While it was initially designed for applications in organic chemistry, to infer molecular structure from information provided by a mass spectrometer, its approach and techniques became influential in AI and knowledge representation. It's not strictly NLP but is very much related in the broader context of AI's evolution.
Paper: DENDRAL: A Case Study Of The First Expert System For Scientific Hypothesis Formation
1970: AUGMENTED TRANSITION NETWORKS (ATNs)

William Woods introduced Augmented Transition Networks (ATNs) as a form of grammar for natural language processing. ATNs were powerful in representing semantic information and became widely used in early NLP systems.
Research Paper: Transition Network Grammars for Natural Language Analysis
1970's: MORPHOLOGICAL PROCESSING

Morphological processing pertains to the structure and formation of words. Martin Kay played a pivotal role in the early '70s by developing advanced algorithms for morphological analysis. His work laid the groundwork for systems to recognize and process different word forms, enhancing the efficiency of language models and their understanding of word variations.
1972: LUNAR SYSTEM

The LUNAR system, crafted by Roger Schank and his students at Yale, was among the initial natural language interfaces designed for databases. This was groundbreaking because it allowed scientists to query a database on moon rock samples using plain English, showcasing the potential of NLP in practical applications.
Paper: The Lunar Science Natural Language Information System: Final Report
1973: SCRIPT THEORY

Roger Schank and Robert Abelson's "script" concept provided a structured representation of sequences in stories. This was integral for machines to comprehend narratives. By identifying and categorizing common sequences in stories, NLP systems could predict and understand a range of scenarios and contexts within human narratives.
Book: Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures
Paper: Scripts, Plans, and Knowledge
1973: WINOGRAD’S SHRDLU COMPLETED

Terry Winograd's SHRDLU system, completed in 1973, could comprehend and formulate English sentences in a simulated "blocks world". This achievement demonstrated how computers could understand and interact using natural language in a confined environment, marking an early success in the journey towards conversational AI.
Paper: SHRDLU
1975: CONCEPTUAL DEPENDENCY THEORY

Roger Schank introduced Conceptual Dependency Theory for natural language understanding in his book "Conceptual Information Processing". This theory proposed that there are a limited number of primitive actions that can represent all possible actions in terms of their conceptual meaning.
Book: Conceptual Information Processing by R. C. Schank
1975: LEXICAL FUNCTIONAL GRAMMAR

Ron Kaplan and Joan Bresnan introduced "Lexical Functional Grammar", emphasizing the importance of lexicon and syntactic structures in language processing. This shift in perspective from traditional grammar theories made it more amenable for computational implementations, influencing future NLP systems.
Paper: Lexical-Functional Grammar: A Formal System for Grammatical Representation
Book: Lexical-Functional Grammar: An Introduction
1977: GUS: A FRAME-DRIVEN DIALOG SYSTEM

Co-authored by Martin Kay, the GUS system was an early example of a frame-driven dialogue system. It marked an attempt to understand natural language input by a computer. The framework-based approach to dialogue was a notable development in the era, showing how structured data representations could be employed to manage and comprehend user input in natural language processing tasks.
Paper: GUS, A Frame-Driven Dialog System
1977: CHAT-80

CHAT-80, developed by Warren and Pereira, could interpret questions about world geography, process them in Prolog, and revert with answers in English. This innovative system showcased the potential of NLP to be integrated with database management, providing natural language interfaces for users.
Paper: An Efficient Easily Adaptable System for Interpreting Natural Language Queries
1979: FUNCTIONAL GRAMMAR

Martin Kay's work on functional grammar provided insights into how linguistic structures can be analyzed and represented. This work laid foundations for subsequent studies in functional approaches to grammar, emphasizing the roles that different elements of sentences play in conveying meaning.
Paper: Functional Grammar
1980's: FRAME-BASED SYSTEMS FOR NLP

In the early 1980s, there was a move towards frame-based systems in NLP, which represent stereotyped situations. Minsky's "Frames" paper, although written in the 70s, had a substantial influence on this trend.
Paper: A Framework For Representing Knowledge by Marvin Minsky [1974]
1980: MORPHOLOGICAL PROCESSING

During the early 1980s, there was significant progress in morphological processing, which deals with understanding the structure of words. This was essential in the development of early NLP systems that aimed to understand and generate human language.
Book: Finite State Morphology
1986: CONNECTIONIST APPROACHES TO LANGUAGE

This period saw a surge in the popularity of connectionist models, also known as neural networks, for NLP tasks. These models were different from symbolic AI models that were popular at the time. Rumelhart, Hinton, and Williams' work on backpropagation was influential in popularizing these models.
Paper: Learning Representation by Back-Propogating Errors
1990: HIDDEN MARKOV MODELS FOR SPEECH RECOGNITION

By the late 80s and early 90s, Hidden Markov Models (HMMs) became the dominant approach for speech recognition, revolutionizing the field. These probabilistic models provided a framework for handling uncertainty in spoken language.
Book: Fundamentals of Speech Recognition
1990: IBM's STATISTICAL APPROACH TO MACHINE TRANSLATION

In 1990, IBM researchers introduced a statistical approach to machine translation, marking a significant departure from the previously dominant rule-based methods. This method laid the foundation for subsequent statistical machine translation techniques.
Paper: A Statistical Approach to Machine Translation
1993: DATA-DRIVEN APPROACHES & THE PENN TREEBANK

In 1993, researchers from the University of Pennsylvania introduced the Penn Treebank project. This substantial effort aimed to construct a large corpus in which sentences were annotated with syntactic and part-of-speech information. The availability of this annotated data empowered machine learning models in NLP, promoting data-driven approaches and facilitating algorithm training on genuine linguistic data.
Paper: Building a Large Annotated Corpus of English: The Penn Treebank
1994: DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION

In 1994, David Yarowsky introduced a statistical decision procedure for lexical ambiguity resolution. The approach utilized both local syntactic patterns and distant collocational evidence, offering an effective method for addressing ambiguities such as restoring missing accents in Spanish and French text.
Paper: Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
1994: MACHINE LEARNING FOR NLP: MEMORY-BASED LEARNING

Memory-based learning, an instance-based methodology, relies on decisions made from stored training examples rather than following abstract rules. Walter Daelemans and his team highlighted its potential for NLP tasks in 1994. This represented an integration of machine learning techniques into linguistic processes, showcasing the increasing synergy between machine learning and NLP.
Paper: Memory-Based Lexical Acquisition and Processing
1995: CENTERING THEORY

Barbara Grosz, Aravind Joshi, and Scott Weinstein formally introduced the Centering Theory in a 1995 paper. It aimed to model the local coherence of discourse, focusing on relationships among attentional focus, choice of referring expressions, and perceived coherence of utterances within a discourse segment. The ideas had been in circulation and development since the early '80s.
Paper: Centering: A Framework for Modelling the Local Coherence of Disclosure
1996: PROBABILISTIC MODELS & MAXIMUM ENTROPY

The mid-1990s marked a significant shift towards probabilistic models in NLP. A landmark in this transition was the adoption of the Maximum Entropy principle, which optimizes model parameters based on the maximum likelihood while respecting constraints from the data. In their 1996 paper, Berger et al. showcased the versatility and efficiency of Maximum Entropy by applying it to various NLP tasks.
Paper: A Maximum Entropy Approach to Natural Language Processing
1997: STATISTICAL PARSING

By 1997, the application of statistical methods was revolutionizing syntactic analysis. Michael Collins introduced generative, lexicalized models for statistical parsing, utilizing data-driven methods that markedly improved accuracy rates in understanding the structure of sentences. His work set new standards for how syntactic parsing was approached.
Paper: Three Generative, Lexicalised Models for Statistical Parsing
2000: MAXIMUM ENTROPY MARKOV MODELS (MEMMs) FOR SEQUENCE MODELING

In 2000, McCallum and colleagues introduced MEMMs, a method that combined the Maximum Entropy principle with sequence modeling. MEMMs represented a leap in sequence tagging tasks, such as part-of-speech tagging, by effectively incorporating both past context and rich feature representations.
Paper: Maximum Entropy Markov Models for Information Extraction and Segmentation
2001: NAMED ENTITY RECOGNITION USING LINEAR CHAIN CRF

Conditional Random Fields (CRFs) were introduced as a robust statistical framework to sequence labeling problems, addressing issues present in prior methods like MEMMs. John Lafferty and his team illustrated the capability of CRFs for named entity recognition, enhancing the precision of extracting names, places, and other specific entities from text.
Paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001: CO-TRAINING METHODS FOR STATISTICAL PARSING

Anoop Sarkar introduced a Co-Training method for statistical parsing that utilized both labeled and unlabeled data. His approach combined a small corpus annotated with parse trees and a vast pool of unlabeled text. The iterative labeling process demonstrated that training a statistical parser using both labeled and unlabeled data significantly outperformed using only labeled data, pushing forward the importance of semi-supervised learning techniques in NLP.
Paper: Applying Co-Training methods to Statistical Parsing
2002: MAXIMUM ENTROPY PARSING

Ratnaparkhi's work took a pioneering step by integrating the Maximum Entropy model into syntactic parsing. This approach blended rich feature representation with data-driven modeling, significantly enhancing the accuracy of automated sentence structure analysis.
Paper: A Maximum Entropy Model for Part-Of-Speech Tagging
2003: LATENT SEMANTIC ANALYSIS AND TOPIC MODELING

Latent Dirichlet Allocation (LDA) presented by Blei and colleagues opened new avenues in extracting topics from vast text collections. By allowing texts to exhibit multiple topics, LDA brought about a more nuanced and granular understanding of content, changing the landscape of topic modeling.
Paper: Latent Dirichlet Allocation
2003: NEURAL NETWORK LANGUAGE MODELS

Bengio's team showcased the potential of neural networks in language modeling. This foundational work signified a departure from traditional n-gram based models, highlighting neural networks' ability to capture intricate language patterns, marking an essential step towards the deep learning era in NLP.
Paper: A Neural Probabilistic Language Model
2004: SEQUENCE LABELING WITH SEMI-MARKOV CRFs

Sarawagi and Cohen enhanced the CRFs framework by introducing a semi-Markov variant. This version was adept at modeling variable-length sequence segments, refining the precision of tasks like information extraction and entity recognition.
Paper: Semi-Markov Conditional Random Fields for Information Extraction
2005: SEMANTIC ROLE LABELING

Palmer and her team emphasized Semantic Role Labeling (SRL), a task that detects the semantic roles within a sentence. By shedding light on the relationships between verbs and their arguments, SRL enriched the depth of semantic text analysis, paving the way for better text understanding.
Paper: The Proposition Bank: An Annotated Corpus of Semantic Roles
2006: INITIATION OF DEEP LEARNING WITH NEURAL NETWORKS

Hinton and Salakhutdinov introduced an effective method of initializing weights for deep autoencoder networks, allowing them to learn low-dimensional codes that significantly outperformed traditional methods like principal components analysis. This pioneering work laid down foundational principles for deep learning, which would later be embraced extensively in NLP and other fields.
Paper: Reducing the Dimensionality of Data with Neural Networks
2008: SEMI-SUPERVISED LEARNING IN NLP

Suzuki and Isozaki demonstrated the potential of combining modest-sized labeled datasets with enormous unlabeled datasets in several NLP tasks, such as part-of-speech tagging, syntactic chunking, and named entity recognition. Their semi-supervised method not only capitalized on the availability of large amounts of unlabeled data but also set new performance standards on widely-used benchmarks.
Paper: Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data
2010: RECURRENT NEURAL NETWORKS FOR LANGUAGE MODELING

Mikolov's introduction of recurrent neural networks (RNNs) for language modeling marked a major stride in handling sequences in NLP. By allowing models to remember past information, RNNs became the cornerstone for developments like sequence-to-sequence models, revolutionizing tasks like translation and text generation.
Paper: Recurrent neural network based language model
2012: DEEP LEARNING FOR SEQUENCE MODELING

Before deep learning, sequence modeling, especially handwriting recognition, was a challenging endeavor. Alex Graves and team demonstrated that recurrent neural networks (RNNs), especially the LSTM variant, can effectively model complex sequences. This discovery provided momentum to LSTMs' adoption in various NLP tasks, from text generation to sentiment classification.
Paper: Sequence Transduction with Recurrent Neural Networks
2013: WORD EMBEDDINGS AND DISTRIBUTED REPRESENTATIONS

Tomas Mikolov and his team at Google revolutionized semantic understanding in NLP with the introduction of word2vec. This method represented words in high-dimensional vector spaces, enabling algorithms to discern semantic relationships between words based purely on their positions in this space. It changed the landscape of NLP, leading to breakthroughs in numerous applications such as machine translation, sentiment analysis, and information retrieval.
Paper: Efficient Estimation of Word Representations in Vector Space
2014: SEQUENCE-TO-SEQUENCE LEARNING

The sequence-to-sequence model, presented by Sutskever and team, provided a novel way of managing tasks that deal with input and output sequences, like translating a sentence from one language to another. By using two LSTMs, one for encoding the input and the other for decoding into the output, this model set the standard for a range of applications including machine translation and automated summarization.
Paper: Sequence to Sequence Learning with Neural Networks
2015: SKIP-THOUGHT VECTORS

Building on the success of word embeddings, Kiros and colleagues introduced an approach to generate sentence embeddings. Named skip-thought vectors, this technique represented entire sentences in vector spaces, offering a way to measure semantic similarity between sentences and thus greatly improving document-level understanding and tasks like text classification and clustering.
Paper: Skip-Thought Vectors
2015: THE RISE OF CHATBOTS

As the digital world continued to grow, there was a growing demand for sophisticated chatbots. Vinyals and Le at Google took on this challenge, showcasing a model that employed deep learning, specifically sequence-to-sequence LSTMs, to handle conversational data. Their work provided the foundation for the next generation of chatbots, dialogue systems, and other conversational AI applications.
Paper: A Neural Conversational Model
2017: TRANSFORMERS AND ATTENTION MECHANISMS

The Transformer architecture, proposed by Vaswani and colleagues at Google, was a turning point in deep learning for NLP. By emphasizing the importance of attention mechanisms, which weigh input features differently, the Transformer model sidestepped the limitations of sequence-based approaches like RNNs. Its flexibility and efficiency became the foundation for several influential models like BERT, GPT, and more.
Paper: Attention Is All You Need
2018: GPT: GENERATIVE PRE-TRAINED TRANSFORMER

OpenAI's GPT showcased the potential of the Transformer architecture in generating coherent and contextually relevant text passages. As a language model, GPT was not just an evolution in scale but also in capability, effectively generating text that was often indistinguishable from human writing in various tasks.
Paper: Improving Language Understanding by Generative Pre-Training
2019: NEURAL MACHINE TRANSLATION MILESTONE

The quest for effective machine translation saw a significant advancement with Facebook AI's LASER. By offering multilingual sentence embeddings, LASER facilitated zero-shot cross-lingual transfer, allowing for translations even in language pairs with limited direct translation data.
Paper: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
2020: GPT-3 AND FEW-SHOT LEARNING

OpenAI unveiled GPT-3, setting a new benchmark in the NLP realm. With a staggering 175 billion parameters, GPT-3 showcased the ability to perform tasks using few-shot learning, demonstrating tasks with minimal instruction and without needing explicit task-specific training data, highlighting the model's versatility and adaptability.
Paper: Language Models are Few-Shot Learners

History Of Natural Language Processing

ANCIENT HISTORY TO 1700S: FOUNDATIONS OF LINGUISTICS

1800S - EARLY 1900S: MATHEMATICAL AND PHILOSOPHICAL FOUNDATIONS

1930s: TURING'S UNIVERSAL MACHINE

1940s: FIRST ATTEMPTS AT MACHINE TRANSLATION

1945: VANNEVAR BUSH'S "MEMEX"

1946: ENIAC – ELECTRONIC NUMERICAL INTEGRATOR AND COMPUTER

1950: TURING TEST

1951: UNIVAC I – THE FIRST COMMERCIAL COMPUTER

1954: GEORGETOWN-IBM EXPERIMENT

1956: GRAMMARS AND CHOMSKY'S HIERARCHY

1957: THE PERCEPTRON

1960: ALGOL 60 AND FORMAL GRAMMARS

1961: BASEBALL

1962: SHRDLU

1964: SYNTACTIC PATTERN RECOGNITION

1964: STUDENT, DEFDEF, AND SIR

1964: BROWN CORPUS RELEASE

1965: SEMANTIC NETWORKS

1964-1966: ELIZA

1966: ALPAC REPORT

1969: DENDRAL

1970: AUGMENTED TRANSITION NETWORKS (ATNs)

1970's: MORPHOLOGICAL PROCESSING

1972: LUNAR SYSTEM

1973: SCRIPT THEORY

1973: WINOGRAD’S SHRDLU COMPLETED

1975: CONCEPTUAL DEPENDENCY THEORY

1975: LEXICAL FUNCTIONAL GRAMMAR

1977: GUS: A FRAME-DRIVEN DIALOG SYSTEM

1977: CHAT-80

1979: FUNCTIONAL GRAMMAR

1980's: FRAME-BASED SYSTEMS FOR NLP

1980: MORPHOLOGICAL PROCESSING

1986: CONNECTIONIST APPROACHES TO LANGUAGE

1990: HIDDEN MARKOV MODELS FOR SPEECH RECOGNITION

1990: IBM's STATISTICAL APPROACH TO MACHINE TRANSLATION

1993: DATA-DRIVEN APPROACHES & THE PENN TREEBANK

1994: DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION

1994: MACHINE LEARNING FOR NLP: MEMORY-BASED LEARNING

1995: CENTERING THEORY

1996: PROBABILISTIC MODELS & MAXIMUM ENTROPY

1997: STATISTICAL PARSING

2000: MAXIMUM ENTROPY MARKOV MODELS (MEMMs) FOR SEQUENCE MODELING

2001: NAMED ENTITY RECOGNITION USING LINEAR CHAIN CRF

2001: CO-TRAINING METHODS FOR STATISTICAL PARSING

2002: MAXIMUM ENTROPY PARSING

2003: LATENT SEMANTIC ANALYSIS AND TOPIC MODELING

2003: NEURAL NETWORK LANGUAGE MODELS

2004: SEQUENCE LABELING WITH SEMI-MARKOV CRFs

2005: SEMANTIC ROLE LABELING

2006: INITIATION OF DEEP LEARNING WITH NEURAL NETWORKS

2008: SEMI-SUPERVISED LEARNING IN NLP

2010: RECURRENT NEURAL NETWORKS FOR LANGUAGE MODELING

2012: DEEP LEARNING FOR SEQUENCE MODELING

2013: WORD EMBEDDINGS AND DISTRIBUTED REPRESENTATIONS

2014: SEQUENCE-TO-SEQUENCE LEARNING

2015: SKIP-THOUGHT VECTORS

2015: THE RISE OF CHATBOTS

2017: TRANSFORMERS AND ATTENTION MECHANISMS

2018: GPT: GENERATIVE PRE-TRAINED TRANSFORMER

2019: NEURAL MACHINE TRANSLATION MILESTONE

2020: GPT-3 AND FEW-SHOT LEARNING

LOGIN