History Of Machine Learning

Please Login/Register To View The Complete Timeline

Early Mathematical Foundations (1800s - Early 1900s)

The roots of machine learning lie in statistics and mathematics. The field of statistics that started to take shape in the 19th century laid the groundwork for machine learning. Key concepts like regression and maximum likelihood estimation were developed during this time.
1943: MCCULLOCH & PITTS' NEURAL NETWORK

Neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how neurons might work, modeling a simple neural network using electrical circuits. This is seen as the starting point for the concept of Artificial Neural Networks.
Research Paper: A Logical Calculus of the Ideas Immanent In Nervous Activity
1947: LINEAR PROGRAMMING

The development of linear programming is credited to several people, but the most influential was George Dantzig. Dantzig, an American mathematician, developed the simplex method, a popular algorithm for solving linear programming problems. He published his work in 1947, and it quickly became the standard method for solving linear programming problems.
Research Paper: Linear Programming
1949: HEBB'S LEARNING RULE

Donald Hebb proposed the theory that neural pathways are strengthened each time they are used, a concept fundamental to the ways in which humans learn. This formed the basis of Hebbian learning, a principle used in several neural network models.
Wiki: Hebbian Theory
Book: The Organization of Behavior
1950: Turing's Concept of Learning Machines

British mathematician and logician Alan Turing introduced the idea of machines that could learn from experience. His paper "Computing Machinery and Intelligence" was published, which proposed what's now known as the "Turing Test". This work laid the foundation for the field of machine learning.
Research Paper: Computing Machinery and Intelligence
1951: Minsky's First Neural Network Machine

Marvin Minsky, who would later become one of the pioneers of artificial intelligence, built the SNARC (Stochastic Neural Analog Reinforcement Calculator), the first neural network machine, suggesting that machines could potentially simulate human intelligence.
Wiki: SNARC
Article
1956: Coining of "Artificial Intelligence"

The phrase "Artificial Intelligence" was coined by John McCarthy at the Dartmouth Conference, the first AI conference to bring together researchers interested in machine intelligence. This is where the idea that machines could simulate any human intelligence was first seriously considered.
Research Paper: A Proposal For The Dartmouth Summer Research Project On Artificial Intelligence
1957: Rosenblatt's Invention of Perceptron

Frank Rosenblatt invented the Perceptron, a type of linear classifier that forms the basis for many neural networks. The Perceptron was the first model that could learn from its mistakes, making it a milestone in the machine learning field.
Research Paper: The Perceptron: A Perceived and Recognizing Automation
1959: Samuel's Self-Learning Program

Arthur Samuel created a program that could play checkers and, more importantly, learn from its mistakes. This program is considered the first self-learning program, and Samuel coined the term "machine learning" to describe the ability of a machine to learn from data.
Research Paper: Some Studies in Machine Learning Using the Game of Checkers

1960: Introduction of ADALINE and MADALINE

Bernard Widrow and Marcian Hoff of Stanford developed models called ADALINE and MADALINE. These models were early ancestors to the modern neural networks and were the first learning machines to use an adaptive filter.

NOTE:
While it's true that Bernard Widrow and Ted Hoff created the ADALINE model (Adaptive Linear Neuron), the MADALINE (Multiple ADALINE) was actually introduced later in 1962.

Wiki: According to Wiki it was in 1960
Standford: According to Standford it wasin 1959
Research Paper: Adaptive Switching Circuits

1963: The Foundation of Kernel Methods

In 1963, Aizerman, Braverman, and Rozonoer proposed the "potential functions" algorithm, which is seen as the precursor to the modern Kernel method. This theory introduced a way to make linear algorithms work in high-dimensional space, a principle which later influenced the development of Support Vector Machines and other related algorithms.
Research Paper: Theoretical Foundations of the Potential Function Method in Pattern Recognition
1967: The Nearest Neighbor Rule

Cover and Hart introduced the k-nearest neighbor algorithm, a simple but effective classification and regression method. This was one of the first instance-based learning algorithms, where the function is approximated locally and all computation is deferred until classification. It's a non-parametric method which remains widely used today.
Research Paper: Nearest Neighbor Pattern Classification
1968: THE GROUP METHOD OF DATA HANDLING (GMDH)

Alexey Ivakhnenko, a Ukrainian scientist, introduced the Group Method of Data Handling (GMDH). This approach involves the use of algorithms for the modeling of complex systems, which work by generating a series of polynomial models and then selecting the best one through the use of a selection criterion. The models are constructed layer by layer, in a hierarchical manner, in order to capture intricate relationships between inputs and outputs. The method is particularly useful when dealing with multi-parametric systems and noisy data. GMDH has been widely applied in fields such as data mining, prediction, and complex system modeling.
Wiki: GMDH
1969: The Limitations of Perceptrons

In their book "Perceptrons," Marvin Minsky and Seymour Papert presented limitations of perceptrons and the Rosenblatt's perceptron learning theorem. They argued that perceptrons couldn't learn an XOR function, which led to a shift in research away from neural networks to symbolic methods for the next decade, commonly referred to as the "AI winter"
PDF Book: Perceptron
Amazon Book: Perceptron
1970: Machine Learning Takes Shape

The 1970s saw the birth of many foundational algorithms and concepts in machine learning, with research focusing on decision tree algorithms, clustering, and ensemble methods.
1973: The C-means Algorithm

Dunn JC introduced the C-means (Fuzzy C-means) algorithm in 1973. It's a method of clustering that allows data points to belong to multiple clusters with varying degrees of membership. This ability to have "soft" cluster assignments differentiated it from earlier "hard" clustering methods like K-means and opened the door for more flexible machine learning models.
Wiki: Fuzzy Clustering
Research Paper: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters
1973: AUTOMATIC INTERACTION DETECTOR (AID)

Bernard Widrow and Michael A. Lehr introduced the AID (Automatic Interaction Detector) algorithm, one of the first decision tree algorithms. The algorithm was used for building decision trees from data and had major implications on the future of machine learning.
1974: The Birth of the CART Algorithm

The CART (Classification And Regression Trees) algorithm, another important decision tree algorithm, was introduced by Breiman, Friedman, Olshen, and Stone. The CART algorithm improved on earlier decision tree algorithms with the addition of pruning techniques to avoid overfitting, making it more applicable for machine learning tasks.
Wiki: CART (Decision Tree)
Amazon Book: Classification And Regression Trees (1st Edition)
1975: The Invention of ID3 Algorithm

Ross Quinlan, a researcher in the field of machine learning, developed the ID3 (Iterative Dichotomiser 3) algorithm. ID3 is an important decision tree algorithm that was influential in the development of later algorithms like C4.5 and random forests.
Wiki: ID3 Algorithm
Research Paper: Induction Of Decision Trees
1980: THE FIRST INTERNATIONAL CONFERENCE ON MACHINE LEARNING

In 1980, the first conference dedicated solely to machine learning, ICML, was held. This conference marked a major milestone by bringing together professionals and researchers from different fields to discuss advancements and make collaborations in the field of machine learning.
Wiki: List of all ICML Conference

1981: Explanation-Based Learning

The concept of Explanation-Based Learning (EBL) was introduced by Gerald Dejong in 1981. EBL is a form of machine learning that uses a detailed understanding of a problem to make generalized decisions, contrasting with other approaches that rely on extensive data. This marked a shift towards machine learning systems that can learn and reason similarly to humans.

NOTE:
It's important to note that the development of EBL was a collective effort, and many other researchers also contributed to its evolution. This includes the work in the 1980s by researchers such as Tom M. Mitchell, who contributed to the field through work on understanding and generalizing from examples.

Wiki: Explanation Based Learning
Research Paper: Genralizations Based On Explainations

1984: VALIANT’S CONCEPT OF 'PROBABLY APPROXIMATELY CORRECT' (PAC)

Leslie Valiant proposed the 'Probably Approximately Correct' (PAC) concept in learning theory. The PAC model provides a mathematical definition of learning and was a significant theoretical advance. This approach gave rise to the field of computational learning theory.
Wiki: Probably Approximately Correct Learning
Research Paper: A Theory of the Learnable
Amazon Book: Probably Approximately Correct (1st Edition)
1985: NetTalk

One of the significant events of 1985 was when Terry Sejnowski invented NetTalk, which learned to pronounce words the same way a baby does. This was a significant step towards neural networks that could 'learn'.
Wiki: NETtalk
Research Paper: Parallel Networks that Learn to Pronounce English Text
1986: Back-Propagation Algorithms

A crucial development in machine learning came about in 1986 with the introduction of the back-propagation algorithm for training multi-layer neural networks. It became a foundational method for training neural networks, published by David Rumelhart, Geoffrey Hinton, and Ronald Williams.
Wiki: Backpropogation
Research Paper: Learning Representations by Back-Propogating Errors
1986: Linear Predictive Coding

In 1986, the development of linear predictive coding by Bishnu S. Atal and Manfred R. Schroeder had a substantial impact on subsequent speech recognition and speech synthesis systems.
Wiki: Linear Predictive Coding
Research Paper: CELP: High Quality Speech at Very Low Bit Rates
1987: Q-Learning

In 1987, a model-free reinforcement learning algorithm was introduced by Chris Watkins. The method, called Q-Learning, would go on to become an essential component in much of the future research in reinforcement learning.
Wiki: Q-Learning
Research Paper: Learning from Delayed Rewards
**Chris Watkins first developed the algorithm in his PhD thesis, which was submitted in 1987. However, the thesis was not published until 1989.
1988: Reinforcement Learning

In 1989, Richard S. Sutton published a paper that helped to better formalize the ideas of reinforcement learning. Reinforcement learning now stands as one of the essential parts of machine learning, helping software agents improve their actions based on reward feedback.
Wiki: Reinforcement Learning
Research Paper: Learning to Predict by Method of Temporal Difference
1992: Advances in Reinforcement Learning

Richard Sutton and Andrew Barto published "Reinforcement Learning: An Introduction", a highly influential book that provided a clear and simple account of the key ideas and algorithms of reinforcement learning. Their work has been instrumental in making reinforcement learning accessible to a broader audience and set the foundation for further developments in the field.
Amazon Book: Reinforcement Learning: An Introduction
1995: Support Vector Machines (SVM)

Vladimir Vapnik and Cortes developed the Support Vector Machine (SVM) method for linear classification. This method was based on the concept of decision planes that define decision boundaries. SVM was a significant step forward in machine learning as it opened up new possibilities for linear and non-linear classification and regression tasks.
Wiki: Support Vector Machine
Research Paper: Support-Vector Networks
1996: THE RISE OF ENSEMBLE METHODS

Leo Breiman, a statistician at the University of California, Berkeley, proposed an ensemble method known as Bootstrap Aggregating (or "bagging"). Bagging involves using multiple decision trees and aggregating their outputs to make predictions. This marked a significant advancement in the use of ensemble methods in machine learning, which combine multiple models to achieve better predictive performance than any of the individual models could achieve alone.
Wiki: Bootstrap Aggregation
Research Paper: Bagging Predictors
1996: Boosting and AdaBoost

Yoav Freund and Robert Schapire proposed AdaBoost, the first practical boosting algorithm, which is an ensemble method that combines weak classifiers to form a strong classifier. This algorithm significantly contributed to the development of machine learning by enhancing the performance of decision trees and sparking further research into boosting methods.
Wiki: Boosting
Research Paper: Experiment with a New Boosting Algorithm
1997: LSTM Networks

Sepp Hochreiter and Jürgen Schmidhuber proposed Long Short-Term Memory (LSTM) networks, a kind of recurrent neural network (RNN) designed to avoid the long-term dependency problem. LSTM has been widely used in deep learning for tasks that require remembering information over long periods, such as time series prediction and natural language processing.
Wiki: Long Short Term Memory
Research Paper: Long Short Term Memory
1998: Convolutional Neural Networks (CNN)

Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner developed LeNet-5, a pioneering 7-level convolutional network for recognition of handwritten and machine-printed characters. It laid the foundation for modern convolutional neural networks (CNNs) and spurred a resurgence of interest in neural networks.
Wiki: Convolutional Neural Network
Research Paper: Gradient-Based Learning Applied to Document Recognition
1998: High-Dimensional Data Visualization

Christopher Bishop, Markus Svensén, and Christopher K. I. Williams proposed the Generative Topographic Mapping (GTM) algorithm, a principled framework for visualizing high-dimensional data. This made it easier for machine learning practitioners to understand and interpret complex datasets, marking a step forward in exploratory data analysis.
Wiki: Generative Topographic Map
Research Paper: GTM: The Generative Topographic Mapping
2000: Support Vector Machines Rise to Prominence

Support Vector Machines (SVMs), introduced by Vladimir Vapnik in the 90s, gained significant attention in the 2000s. SVMs, especially with kernel tricks, proved effective in a variety of classification and regression tasks, establishing themselves as a powerful tool in machine learning.
2001: Bagging and Random Forests

Leo Breiman proposed the random forest algorithm, an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees. Random forests are a significant development in machine learning, as they provide high prediction accuracy and feature importance estimation, and are robust against overfitting.
Wiki: Random Forest
Research Paper: Random Forests
2001: The First Kernel Trick

In 2001, Bernhard Schölkopf and Alexander J. Smola published "Learning with Kernels," introducing the "kernel trick." This technique allowed SVMs to map their inputs into high-dimensional feature spaces, making them more powerful and flexible. This marked a crucial development in SVMs and machine learning at large.
Wiki: Kernel Methods
Amazon Book: Learning with Kernels
2003: Latent Dirichlet Allocation (LDA)

David M. Blei, Andrew Y. Ng, and Michael I. Jordan introduced a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics.
Wiki: LDA
Research Paper: Latent Dirichlet Allocation
2006: The Era of Deep Learning Begins

Hinton, Osindero, and Teh published a paper on a fast learning algorithm for deep belief nets, marking the resurgence of neural networks in the form of deep learning. This paper demonstrated how to effectively train deep neural networks, leading to significant advancements in areas like speech recognition, image classification, and natural language processing.
Research Paper: A fast Learning Algorithms for Deep Belief Nets
2007: The Netflix Prize

The Netflix Prize was a machine learning and data mining competition for movie rating prediction. It awarded $1 million to the BellKor's Pragmatic Chaos team for improving Netflix's recommendation algorithm's accuracy by over 10%. This competition drew worldwide attention to the capabilities of machine learning algorithms in real-world applications.
About Netflix Prize: Paper
2009: ImageNet Begins

The ImageNet project kicked off in 2009, aiming to provide a large-scale labeled dataset for researchers working on computer vision problems. It later became the basis for the annual ImageNet Large-Scale Visual Recognition Challenge (ILSVRC), which propelled advancements in deep learning and AI.
Research Paper: ImageNet: A Large-Scale Hierarchical Image Database
2010: Kinect, Computer Vision in the Consumer Market

Microsoft launched the Kinect as a motion-sensing input device for the Xbox 360. It used machine learning in real-time to process depth data. Kinect's technology brought computer vision to the consumer market, and the device had widespread implications for how machine learning could be implemented in consumer technology.
Research Paper: Microsoft Kinect Sensor and Its Effect
** There was a research paper written on the working on Kinect in 2012
2010: The Emergence of Deep Learning

Geoffrey Hinton and his colleagues presented a paper that discussed using deep belief networks for phone recognition, marking a significant event in the application of deep learning. However, deep learning's conceptual underpinnings extend back to earlier years.
Research Paper: Deep Belief Networks Using Discriminative Features for Phone Recognition
2012: Google's Large Scale Distributed Deep Networks

In a landmark paper, Google researchers demonstrated the power of using large-scale distributed systems to train deep learning models. This paper showcased Google's distributed computing capabilities and how they could be used to train large deep learning models on massive datasets, a trend that continues to this day.
Research Paper: Large Scale Distributed Deep Networks
2012: AlexNet and the Deep Learning Revolution

In 2012, a deep Convolutional Neural Network (CNN) called AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a prestigious competition in image classification. This marked a turning point in the adoption of deep learning methods, as AlexNet significantly outperformed traditional machine learning approaches.
Wiki: AlexNet
Research Paper: ImageNet Classification with Deep Convolutional Neural Network
2013: Emergence of Word Embeddings in NLP

Tomas Mikolov, et al. from Google introduced word2vec, a group of related models that are used to produce word embeddings - numerical forms of linguistic context. This revolutionized Natural Language Processing (NLP) by enabling machines to understand words in relation to other words, based on the "distance" between word vectors.
Google Research: Word2Vec (Colab)
Research Paper: Efficient Estimation of Word Representations in Vector Space
2014: Generative Adversarial Networks (GANs)

Ian Goodfellow and his colleagues introduced the concept of Generative Adversarial Networks (GANs). GANs consist of two neural networks, a generator and a discriminator, that are trained together. The generator tries to create data that the discriminator cannot distinguish from real data, resulting in the creation of realistic synthetic data.
Book: Deep Learning (Ian Goodfellow, MIT Press)
Research Paper: Generative Adversarial Nets
2016: The AlphaGo Milestone

Google's DeepMind developed AlphaGo, a computer program that defeated world champion Go player Lee Sedol in a five-game match. This was considered a major breakthrough in AI, as the game of Go, with its vast number of potential moves, had long been considered a challenging benchmark for AI.
Google Deep Mind: AlphaGo
Research Paper: Mastering the Game of Go with Deep Neural Networks and Tree Search
2017: Rise of the Transformers in NLP

The Transformer model, introduced in the paper "Attention is All You Need" by Vaswani, et al., brought about a significant change in the field of NLP. This model introduced the concept of "attention mechanisms," which allow the model to focus on different parts of the input sequence when producing an output, leading to improved translation accuracy.
Research Paper: Attention Is All You Need
2018: Language Models Take Center Stage

OpenAI's GPT (Generative Pretrained Transformer) and Google's BERT (Bidirectional Encoder Representations from Transformers) were released, setting new standards in language modeling. These transformer-based models demonstrated unprecedented performance on a variety of NLP tasks and led to the emergence of large language models as a key area of research in machine learning.
GPT Research Paper: Improving Language Understanding by Generative Pre-Training
BERT Research Paper: BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding
2019: The Dawn of EfficientNets

Google's AI team introduced EfficientNets, which are a family of advanced Convolutional Neural Networks (CNN). The EfficientNet models use a compound scaling method to uniformly scale each dimension of the network, resulting in models that are both smaller and more efficient, yet perform as well as or better than larger, more complex models.
Google Research Blog: EfficientNet
Research Paper: EfficientNet: Rethinking Model Scaling for CNN
2020: GPT-3 and the Rise of Large Language Models

OpenAI introduced GPT-3, a transformer-based language model with 175 billion machine learning parameters. This was a massive leap from its predecessor, GPT-2, which had 1.5 billion parameters. GPT-3 demonstrated remarkable capabilities in generating human-like text, raising both hopes and concerns about the future of AI.
Research Paper: Language Models are Few-Shot Learners

History Of Machine Learning

Please Login/Register To View The Complete Timeline

Early Mathematical Foundations (1800s - Early 1900s)

1943: MCCULLOCH & PITTS' NEURAL NETWORK

1947: LINEAR PROGRAMMING

1949: HEBB'S LEARNING RULE

1950: Turing's Concept of Learning Machines

1951: Minsky's First Neural Network Machine

1956: Coining of "Artificial Intelligence"

1957: Rosenblatt's Invention of Perceptron

1959: Samuel's Self-Learning Program

1960: Introduction of ADALINE and MADALINE

1963: The Foundation of Kernel Methods

1967: The Nearest Neighbor Rule

1968: THE GROUP METHOD OF DATA HANDLING (GMDH)

1969: The Limitations of Perceptrons

1970: Machine Learning Takes Shape

1973: The C-means Algorithm

1973: AUTOMATIC INTERACTION DETECTOR (AID)

1974: The Birth of the CART Algorithm

1975: The Invention of ID3 Algorithm

1980: THE FIRST INTERNATIONAL CONFERENCE ON MACHINE LEARNING

1981: Explanation-Based Learning

1984: VALIANT’S CONCEPT OF 'PROBABLY APPROXIMATELY CORRECT' (PAC)

1985: NetTalk

1986: Back-Propagation Algorithms

1986: Linear Predictive Coding

1987: Q-Learning

1988: Reinforcement Learning

1992: Advances in Reinforcement Learning

1995: Support Vector Machines (SVM)

1996: THE RISE OF ENSEMBLE METHODS

1996: Boosting and AdaBoost

1997: LSTM Networks

1998: Convolutional Neural Networks (CNN)

1998: High-Dimensional Data Visualization

2000: Support Vector Machines Rise to Prominence

2001: Bagging and Random Forests

2001: The First Kernel Trick

2003: Latent Dirichlet Allocation (LDA)

2006: The Era of Deep Learning Begins

2007: The Netflix Prize

2009: ImageNet Begins

2010: Kinect, Computer Vision in the Consumer Market

2010: The Emergence of Deep Learning

2012: Google's Large Scale Distributed Deep Networks

2012: AlexNet and the Deep Learning Revolution

2013: Emergence of Word Embeddings in NLP

2014: Generative Adversarial Networks (GANs)

2016: The AlphaGo Milestone

2017: Rise of the Transformers in NLP

2018: Language Models Take Center Stage

2019: The Dawn of EfficientNets

2020: GPT-3 and the Rise of Large Language Models

LOGIN