History Of Machine Learning
Please Login/Register To View The Complete Timeline

Early Mathematical Foundations (1800s  Early 1900s)
The roots of machine learning lie in statistics and mathematics. The field of statistics that started to take shape in the 19th century laid the groundwork for machine learning. Key concepts like regression and maximum likelihood estimation were developed during this time.

1943: MCCULLOCH & PITTS' NEURAL NETWORK
Neurophysiologist Warren McCulloch and mathematician Walter Pitts wrote a paper on how neurons might work, modeling a simple neural network using electrical circuits. This is seen as the starting point for the concept of Artificial Neural Networks.
Research Paper: A Logical Calculus of the Ideas Immanent In Nervous Activity

1947: LINEAR PROGRAMMING
The development of linear programming is credited to several people, but the most influential was George Dantzig. Dantzig, an American mathematician, developed the simplex method, a popular algorithm for solving linear programming problems. He published his work in 1947, and it quickly became the standard method for solving linear programming problems.
Research Paper: Linear Programming

1949: HEBB'S LEARNING RULE
Donald Hebb proposed the theory that neural pathways are strengthened each time they are used, a concept fundamental to the ways in which humans learn. This formed the basis of Hebbian learning, a principle used in several neural network models.
Wiki: Hebbian Theory
Book: The Organization of Behavior 
1950: Turing's Concept of Learning Machines
British mathematician and logician Alan Turing introduced the idea of machines that could learn from experience. His paper "Computing Machinery and Intelligence" was published, which proposed what's now known as the "Turing Test". This work laid the foundation for the field of machine learning.
Research Paper: Computing Machinery and Intelligence

1951: Minsky's First Neural Network Machine

1956: Coining of "Artificial Intelligence"
The phrase "Artificial Intelligence" was coined by John McCarthy at the Dartmouth Conference, the first AI conference to bring together researchers interested in machine intelligence. This is where the idea that machines could simulate any human intelligence was first seriously considered.
Research Paper: A Proposal For The Dartmouth Summer Research Project On Artificial Intelligence

1957: Rosenblatt's Invention of Perceptron
Frank Rosenblatt invented the Perceptron, a type of linear classifier that forms the basis for many neural networks. The Perceptron was the first model that could learn from its mistakes, making it a milestone in the machine learning field.
Research Paper: The Perceptron: A Perceived and Recognizing Automation

1959: Samuel's SelfLearning Program
Arthur Samuel created a program that could play checkers and, more importantly, learn from its mistakes. This program is considered the first selflearning program, and Samuel coined the term "machine learning" to describe the ability of a machine to learn from data.
Research Paper: Some Studies in Machine Learning Using the Game of Checkers

1960: Introduction of ADALINE and MADALINE
Bernard Widrow and Marcian Hoff of Stanford developed models called ADALINE and MADALINE. These models were early ancestors to the modern neural networks and were the first learning machines to use an adaptive filter.
NOTE:
While it's true that Bernard Widrow and Ted Hoff created the ADALINE model (Adaptive Linear Neuron), the MADALINE (Multiple ADALINE) was actually introduced later in 1962.Wiki: According to Wiki it was in 1960
Standford: According to Standford it wasin 1959
Research Paper: Adaptive Switching Circuits 
1963: The Foundation of Kernel Methods
In 1963, Aizerman, Braverman, and Rozonoer proposed the "potential functions" algorithm, which is seen as the precursor to the modern Kernel method. This theory introduced a way to make linear algorithms work in highdimensional space, a principle which later influenced the development of Support Vector Machines and other related algorithms.
Research Paper: Theoretical Foundations of the Potential Function Method in Pattern Recognition

1967: The Nearest Neighbor Rule
Cover and Hart introduced the knearest neighbor algorithm, a simple but effective classification and regression method. This was one of the first instancebased learning algorithms, where the function is approximated locally and all computation is deferred until classification. It's a nonparametric method which remains widely used today.
Research Paper: Nearest Neighbor Pattern Classification

1968: THE GROUP METHOD OF DATA HANDLING (GMDH)
Alexey Ivakhnenko, a Ukrainian scientist, introduced the Group Method of Data Handling (GMDH). This approach involves the use of algorithms for the modeling of complex systems, which work by generating a series of polynomial models and then selecting the best one through the use of a selection criterion. The models are constructed layer by layer, in a hierarchical manner, in order to capture intricate relationships between inputs and outputs. The method is particularly useful when dealing with multiparametric systems and noisy data. GMDH has been widely applied in fields such as data mining, prediction, and complex system modeling.
Wiki: GMDH

1969: The Limitations of Perceptrons
In their book "Perceptrons," Marvin Minsky and Seymour Papert presented limitations of perceptrons and the Rosenblatt's perceptron learning theorem. They argued that perceptrons couldn't learn an XOR function, which led to a shift in research away from neural networks to symbolic methods for the next decade, commonly referred to as the "AI winter"
PDF Book: Perceptron
Amazon Book: Perceptron 
1970: Machine Learning Takes Shape
The 1970s saw the birth of many foundational algorithms and concepts in machine learning, with research focusing on decision tree algorithms, clustering, and ensemble methods.

1973: The Cmeans Algorithm
Dunn JC introduced the Cmeans (Fuzzy Cmeans) algorithm in 1973. It's a method of clustering that allows data points to belong to multiple clusters with varying degrees of membership. This ability to have "soft" cluster assignments differentiated it from earlier "hard" clustering methods like Kmeans and opened the door for more flexible machine learning models.
Wiki: Fuzzy Clustering
Research Paper: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact WellSeparated Clusters 
1973: AUTOMATIC INTERACTION DETECTOR (AID)
Bernard Widrow and Michael A. Lehr introduced the AID (Automatic Interaction Detector) algorithm, one of the first decision tree algorithms. The algorithm was used for building decision trees from data and had major implications on the future of machine learning.

1974: The Birth of the CART Algorithm
The CART (Classification And Regression Trees) algorithm, another important decision tree algorithm, was introduced by Breiman, Friedman, Olshen, and Stone. The CART algorithm improved on earlier decision tree algorithms with the addition of pruning techniques to avoid overfitting, making it more applicable for machine learning tasks.
Wiki: CART (Decision Tree)
Amazon Book: Classification And Regression Trees (1st Edition) 
1975: The Invention of ID3 Algorithm
Ross Quinlan, a researcher in the field of machine learning, developed the ID3 (Iterative Dichotomiser 3) algorithm. ID3 is an important decision tree algorithm that was influential in the development of later algorithms like C4.5 and random forests.
Wiki: ID3 Algorithm
Research Paper: Induction Of Decision Trees 
1980: THE FIRST INTERNATIONAL CONFERENCE ON MACHINE LEARNING
In 1980, the first conference dedicated solely to machine learning, ICML, was held. This conference marked a major milestone by bringing together professionals and researchers from different fields to discuss advancements and make collaborations in the field of machine learning.
Wiki: List of all ICML Conference

1981: ExplanationBased Learning
The concept of ExplanationBased Learning (EBL) was introduced by Gerald Dejong in 1981. EBL is a form of machine learning that uses a detailed understanding of a problem to make generalized decisions, contrasting with other approaches that rely on extensive data. This marked a shift towards machine learning systems that can learn and reason similarly to humans.
NOTE:
It's important to note that the development of EBL was a collective effort, and many other researchers also contributed to its evolution. This includes the work in the 1980s by researchers such as Tom M. Mitchell, who contributed to the field through work on understanding and generalizing from examples.Wiki: Explanation Based Learning
Research Paper: Genralizations Based On Explainations 
1984: VALIANT’S CONCEPT OF 'PROBABLY APPROXIMATELY CORRECT' (PAC)
Leslie Valiant proposed the 'Probably Approximately Correct' (PAC) concept in learning theory. The PAC model provides a mathematical definition of learning and was a significant theoretical advance. This approach gave rise to the field of computational learning theory.
Wiki: Probably Approximately Correct Learning
Research Paper: A Theory of the Learnable
Amazon Book: Probably Approximately Correct (1st Edition) 
1985: NetTalk
One of the significant events of 1985 was when Terry Sejnowski invented NetTalk, which learned to pronounce words the same way a baby does. This was a significant step towards neural networks that could 'learn'.
Wiki: NETtalk
Research Paper: Parallel Networks that Learn to Pronounce English Text 
1986: BackPropagation Algorithms
A crucial development in machine learning came about in 1986 with the introduction of the backpropagation algorithm for training multilayer neural networks. It became a foundational method for training neural networks, published by David Rumelhart, Geoffrey Hinton, and Ronald Williams.
Wiki: Backpropogation
Research Paper: Learning Representations by BackPropogating Errors

1986: Linear Predictive Coding
In 1986, the development of linear predictive coding by Bishnu S. Atal and Manfred R. Schroeder had a substantial impact on subsequent speech recognition and speech synthesis systems.
Wiki: Linear Predictive Coding
Research Paper: CELP: High Quality Speech at Very Low Bit Rates 
1987: QLearning
In 1987, a modelfree reinforcement learning algorithm was introduced by Chris Watkins. The method, called QLearning, would go on to become an essential component in much of the future research in reinforcement learning.
Wiki: QLearning
Research Paper: Learning from Delayed Rewards
**Chris Watkins first developed the algorithm in his PhD thesis, which was submitted in 1987. However, the thesis was not published until 1989. 
1988: Reinforcement Learning
In 1989, Richard S. Sutton published a paper that helped to better formalize the ideas of reinforcement learning. Reinforcement learning now stands as one of the essential parts of machine learning, helping software agents improve their actions based on reward feedback.
Wiki: Reinforcement Learning
Research Paper: Learning to Predict by Method of Temporal Difference 
1992: Advances in Reinforcement Learning
Richard Sutton and Andrew Barto published "Reinforcement Learning: An Introduction", a highly influential book that provided a clear and simple account of the key ideas and algorithms of reinforcement learning. Their work has been instrumental in making reinforcement learning accessible to a broader audience and set the foundation for further developments in the field.
Amazon Book: Reinforcement Learning: An Introduction

1995: Support Vector Machines (SVM)
Vladimir Vapnik and Cortes developed the Support Vector Machine (SVM) method for linear classification. This method was based on the concept of decision planes that define decision boundaries. SVM was a significant step forward in machine learning as it opened up new possibilities for linear and nonlinear classification and regression tasks.
Wiki: Support Vector Machine
Research Paper: SupportVector Networks 
1996: THE RISE OF ENSEMBLE METHODS
Leo Breiman, a statistician at the University of California, Berkeley, proposed an ensemble method known as Bootstrap Aggregating (or "bagging"). Bagging involves using multiple decision trees and aggregating their outputs to make predictions. This marked a significant advancement in the use of ensemble methods in machine learning, which combine multiple models to achieve better predictive performance than any of the individual models could achieve alone.
Wiki: Bootstrap Aggregation
Research Paper: Bagging Predictors 
1996: Boosting and AdaBoost
Yoav Freund and Robert Schapire proposed AdaBoost, the first practical boosting algorithm, which is an ensemble method that combines weak classifiers to form a strong classifier. This algorithm significantly contributed to the development of machine learning by enhancing the performance of decision trees and sparking further research into boosting methods.
Wiki: Boosting
Research Paper: Experiment with a New Boosting Algorithm 
1997: LSTM Networks
Sepp Hochreiter and Jürgen Schmidhuber proposed Long ShortTerm Memory (LSTM) networks, a kind of recurrent neural network (RNN) designed to avoid the longterm dependency problem. LSTM has been widely used in deep learning for tasks that require remembering information over long periods, such as time series prediction and natural language processing.
Wiki: Long Short Term Memory
Research Paper: Long Short Term Memory 
1998: Convolutional Neural Networks (CNN)
Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner developed LeNet5, a pioneering 7level convolutional network for recognition of handwritten and machineprinted characters. It laid the foundation for modern convolutional neural networks (CNNs) and spurred a resurgence of interest in neural networks.
Wiki: Convolutional Neural Network
Research Paper: GradientBased Learning Applied to Document Recognition 
1998: HighDimensional Data Visualization
Christopher Bishop, Markus Svensén, and Christopher K. I. Williams proposed the Generative Topographic Mapping (GTM) algorithm, a principled framework for visualizing highdimensional data. This made it easier for machine learning practitioners to understand and interpret complex datasets, marking a step forward in exploratory data analysis.
Wiki: Generative Topographic Map
Research Paper: GTM: The Generative Topographic Mapping 
2000: Support Vector Machines Rise to Prominence
Support Vector Machines (SVMs), introduced by Vladimir Vapnik in the 90s, gained significant attention in the 2000s. SVMs, especially with kernel tricks, proved effective in a variety of classification and regression tasks, establishing themselves as a powerful tool in machine learning.

2001: Bagging and Random Forests
Leo Breiman proposed the random forest algorithm, an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees. Random forests are a significant development in machine learning, as they provide high prediction accuracy and feature importance estimation, and are robust against overfitting.
Wiki: Random Forest
Research Paper: Random Forests 
2001: The First Kernel Trick
In 2001, Bernhard Schölkopf and Alexander J. Smola published "Learning with Kernels," introducing the "kernel trick." This technique allowed SVMs to map their inputs into highdimensional feature spaces, making them more powerful and flexible. This marked a crucial development in SVMs and machine learning at large.
Wiki: Kernel Methods
Amazon Book: Learning with Kernels 
2003: Latent Dirichlet Allocation (LDA)
David M. Blei, Andrew Y. Ng, and Michael I. Jordan introduced a generative probabilistic model for collections of discrete data such as text corpora. LDA is a threelevel hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics.
Wiki: LDA
Research Paper: Latent Dirichlet Allocation 
2006: The Era of Deep Learning Begins
Hinton, Osindero, and Teh published a paper on a fast learning algorithm for deep belief nets, marking the resurgence of neural networks in the form of deep learning. This paper demonstrated how to effectively train deep neural networks, leading to significant advancements in areas like speech recognition, image classification, and natural language processing.
Research Paper: A fast Learning Algorithms for Deep Belief Nets

2007: The Netflix Prize
The Netflix Prize was a machine learning and data mining competition for movie rating prediction. It awarded $1 million to the BellKor's Pragmatic Chaos team for improving Netflix's recommendation algorithm's accuracy by over 10%. This competition drew worldwide attention to the capabilities of machine learning algorithms in realworld applications.
About Netflix Prize: Paper

2009: ImageNet Begins
The ImageNet project kicked off in 2009, aiming to provide a largescale labeled dataset for researchers working on computer vision problems. It later became the basis for the annual ImageNet LargeScale Visual Recognition Challenge (ILSVRC), which propelled advancements in deep learning and AI.
Research Paper: ImageNet: A LargeScale Hierarchical Image Database

2010: Kinect, Computer Vision in the Consumer Market
Microsoft launched the Kinect as a motionsensing input device for the Xbox 360. It used machine learning in realtime to process depth data. Kinect's technology brought computer vision to the consumer market, and the device had widespread implications for how machine learning could be implemented in consumer technology.
Research Paper: Microsoft Kinect Sensor and Its Effect
** There was a research paper written on the working on Kinect in 2012 
2010: The Emergence of Deep Learning
Geoffrey Hinton and his colleagues presented a paper that discussed using deep belief networks for phone recognition, marking a significant event in the application of deep learning. However, deep learning's conceptual underpinnings extend back to earlier years.
Research Paper: Deep Belief Networks Using Discriminative Features for Phone Recognition

2012: Google's Large Scale Distributed Deep Networks
In a landmark paper, Google researchers demonstrated the power of using largescale distributed systems to train deep learning models. This paper showcased Google's distributed computing capabilities and how they could be used to train large deep learning models on massive datasets, a trend that continues to this day.
Research Paper: Large Scale Distributed Deep Networks

2012: AlexNet and the Deep Learning Revolution
In 2012, a deep Convolutional Neural Network (CNN) called AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a prestigious competition in image classification. This marked a turning point in the adoption of deep learning methods, as AlexNet significantly outperformed traditional machine learning approaches.
Wiki: AlexNet
Research Paper: ImageNet Classification with Deep Convolutional Neural Network 
2013: Emergence of Word Embeddings in NLP
Tomas Mikolov, et al. from Google introduced word2vec, a group of related models that are used to produce word embeddings  numerical forms of linguistic context. This revolutionized Natural Language Processing (NLP) by enabling machines to understand words in relation to other words, based on the "distance" between word vectors.
Google Research: Word2Vec (Colab)
Research Paper: Efficient Estimation of Word Representations in Vector Space 
2014: Generative Adversarial Networks (GANs)
Ian Goodfellow and his colleagues introduced the concept of Generative Adversarial Networks (GANs). GANs consist of two neural networks, a generator and a discriminator, that are trained together. The generator tries to create data that the discriminator cannot distinguish from real data, resulting in the creation of realistic synthetic data.
Book: Deep Learning (Ian Goodfellow, MIT Press)
Research Paper: Generative Adversarial Nets 
2016: The AlphaGo Milestone
Google's DeepMind developed AlphaGo, a computer program that defeated world champion Go player Lee Sedol in a fivegame match. This was considered a major breakthrough in AI, as the game of Go, with its vast number of potential moves, had long been considered a challenging benchmark for AI.
Google Deep Mind: AlphaGo
Research Paper: Mastering the Game of Go with Deep Neural Networks and Tree Search 
2017: Rise of the Transformers in NLP
The Transformer model, introduced in the paper "Attention is All You Need" by Vaswani, et al., brought about a significant change in the field of NLP. This model introduced the concept of "attention mechanisms," which allow the model to focus on different parts of the input sequence when producing an output, leading to improved translation accuracy.
Research Paper: Attention Is All You Need

2018: Language Models Take Center Stage
OpenAI's GPT (Generative Pretrained Transformer) and Google's BERT (Bidirectional Encoder Representations from Transformers) were released, setting new standards in language modeling. These transformerbased models demonstrated unprecedented performance on a variety of NLP tasks and led to the emergence of large language models as a key area of research in machine learning.
GPT Research Paper: Improving Language Understanding by Generative PreTraining
BERT Research Paper: BERT: PreTraining of Deep Bidirectional Transformers for Language Understanding 
2019: The Dawn of EfficientNets
Google's AI team introduced EfficientNets, which are a family of advanced Convolutional Neural Networks (CNN). The EfficientNet models use a compound scaling method to uniformly scale each dimension of the network, resulting in models that are both smaller and more efficient, yet perform as well as or better than larger, more complex models.
Google Research Blog: EfficientNet
Research Paper: EfficientNet: Rethinking Model Scaling for CNN 
2020: GPT3 and the Rise of Large Language Models
OpenAI introduced GPT3, a transformerbased language model with 175 billion machine learning parameters. This was a massive leap from its predecessor, GPT2, which had 1.5 billion parameters. GPT3 demonstrated remarkable capabilities in generating humanlike text, raising both hopes and concerns about the future of AI.
Research Paper: Language Models are FewShot Learners