Decision Trees: Unveiling the Power of Predictive Branching

Table of Contents


Definition and Overview of Decision Trees

Welcome to our journey through the world of Decision Trees. But before we begin, you may ask – what is a Decision Tree? Picture a tree in your mind. It has a trunk, branches, and leaves. Now, imagine if this tree could help you make decisions!

In machine learning, a Decision Tree is a fancy flowchart that helps you make decisions based on certain rules. It’s like a game of “20 questions.” You start with a big question at the trunk, then move along different branches by answering smaller questions until you reach the leaves, where you find your answer!

When and Why to Use Decision Trees

Imagine you’re planning a picnic and need to decide whether to hold it indoors or outdoors. You’d probably consider various factors like weather, season, or the number of guests. You’d follow a process of elimination based on these factors to make your decision. This is exactly what Decision Trees do!

Decision Trees are great when you have lots of factors to consider, and you need to make decisions based on these factors. They’re simple to understand and interpret, and they can handle both numerical and categorical data, making them a popular choice for classification and regression tasks in machine learning.


Recap of Classification Algorithms

In our previous articles, we’ve explored various classification algorithms like Logistic Regression, Naive Bayes, and KNN Classifier. These algorithms are like detectives that try to categorize or classify information based on what they’ve learned. Today, we will meet a new member of this detective team – the Decision Tree!

Introducing the Concept of Tree-Based Models

Just like a family tree can help you understand your family’s history, a Decision Tree can help you understand how different decisions are connected. It’s a type of model that breaks down a dataset into smaller and smaller subsets while at the same time developing an associated decision tree. This tree is a graphical representation of all the possible solutions to a decision, based on certain conditions. It’s like a roadmap guiding you to your decision!

Explanation of Overfitting and How Decision Trees Can Help Mitigate It

In machine learning, it’s important to build models that can work well not just with the data they’ve been trained on but also with new, unseen data. But sometimes, a model might try too hard to learn from the training data, memorizing it instead of understanding it. This is like studying for an exam by memorizing the textbook instead of understanding the concepts. Such a model is said to be “overfitting.”

While Decision Trees can sometimes fall into the overfitting trap, there are ways to prune or trim the tree to prevent this. It’s like trimming the branches of a tree that’s growing too wildly, to keep it healthy. We’ll explore this concept further as we delve deeper into Decision Trees!


Description of Decision Tree Structure
Imagine you’re playing a game of ’20 Questions’. You think of an animal, and your friend tries to guess what it is by asking yes/no questions. “Is it a mammal?” “Does it have four legs?” “Is it a pet?” Each question narrows down the options until they can make a good guess.

A decision tree works much like this game. It starts with a broad question at the root (the top), and for each answer, it branches off into further, more specific questions, until it arrives at a decision (the leaves at the bottom).

Each point where the tree splits into branches is called a node. The first node, where we start, is known as the root node. The nodes that branch off from there are internal nodes, and the final nodes, where we make our decision, are the leaf nodes. So, in our game of 20 questions, each question would be a node, and each final guess would be a leaf node.

Explanation of How Decision Trees Make Decisions

How does the decision tree know which questions to ask, and in what order? It’s all about picking the question that gives the most useful information, or in other words, reduces uncertainty the most.

Imagine you’re trying to guess a number between 1 and 100. If you start by asking, “Is it less than 50?” you instantly cut the options in half. But if you ask, “Is it less than 10?” you only eliminate a tenth of the options. The first question is more valuable because it gives more information, and that’s why the decision tree would ask it first.

Decision Tree Making Decisions

Differences Between Decision Trees and Other Classification Algorithms

Other classification algorithms like Logistic Regression or KNN might feel like you’re trying to catch fish in a murky pond, where everything is mixed together and you can’t really see what you’re doing. But a decision tree is like fishing with a net that has different sections for different types of fish. It separates the different categories clearly, making it easier to understand and interpret.


Understanding Gini Impurity and Information Gain

To make the best splits, decision trees use measures called Gini Impurity and Information Gain.

Imagine you’re sorting a mixed bag of red and blue marbles into two boxes. Gini Impurity is a measure of how mixed up the marbles are in each box. If a box has only red marbles, its Gini Impurity is 0 (no impurity). But if it has an equal number of red and blue marbles, its Gini Impurity is 0.5 (maximum impurity for binary classification).

Information Gain, on the other hand, is the opposite – it measures how much sorting the marbles reduces the impurity. The more the impurity is reduced, the more information we gain.

How Decision Trees Choose the Best Split

So, how does the decision tree decide where to make each split? It tries out different splits, calculates the Gini Impurity or Information Gain for each one, and picks the split that reduces impurity the most (or gains the most information).

Going back to our marble example, the decision tree might try splitting the marbles by color, by size, by weight, etc., and it would choose the split that results in the most pure (single-color) boxes.

So, as you can see, decision trees make decisions much like we do, by asking questions, weighing options, and choosing the most informative path. They’re powerful tools that can help us make sense of complex information.


Root Node, Internal Node, Leaf Node

A Decision Tree works a bit like the game “20 Questions.” You start at the top with a single question that splits into possible answers. Each answer leads to another question, and so on until you get the final answer. The spots where we ask questions are called nodes. There are three types of nodes:

  • Root Node: This is the topmost node, where we begin. It represents the entire data set, which gets divided into two or more homogeneous sets.
  • Internal Node: These are the nodes on which we ask further questions after the root node. They help us make additional decisions.
  • Leaf Node: These are the final nodes, where we arrive at the decision (or in other words, the output of our decision tree).

Splitting Criteria (Gini Impurity, Information Gain)

Asking the right question at each node is crucial. Two common methods for choosing the best question (or split) are Gini Impurity and Information Gain:

  • Gini Impurity: This method measures the probability of a random sample being classified correctly if you randomly pick a label according to the distribution in a branch.
  • Information Gain: This method measures how much information a question gives us, or more technically, the reduction in entropy. It helps to identify the splits that give the most useful information for classification.


Overfitting is when our tree is too deeply rooted in our training data (the questions are too specific), which can lead it to perform poorly with new data. It’s like studying for a test by memorizing the answers to the practice questions instead of understanding the concepts. If the actual test has different questions, the student won’t perform well.


Pruning is a way to trim back an overgrown tree. It’s the process of reducing the size of the tree (removing branches) to avoid overfitting. Pruning is like simplifying a too-complex rule. It allows the model to generalize better from our training data to new data.


Defining a Practical Problem That Can Be Solved Using Decision Trees
Let’s say you work for a bank. Your job is to decide whether to give a loan to a customer or not. You have data about their income, credit score, employment status, etc. A decision tree can help you make this decision.

Implementing Decision Trees to Solve the Problem
You can create a decision tree where the root node could be a question like: “Is the credit score greater than 700?” The branches could be “Yes” and “No”. Then, for each answer, the tree could ask more questions: “Is the income more than $50,000?” or “Have they been employed for more than a year?”, and so on, until it reaches a decision.

Discussing the Outcomes
By following the branches of the tree based on the customer’s information, you can reach a decision on whether to give them a loan or not. This can make the process quicker and more consistent. Plus, the tree gives you a nice visual of how the decision was made.

Other Real-World Uses of Decision Trees:

  • In medicine, decision trees are used to aid diagnoses based on symptoms.
  • In business, they can help understand the factors that influence customer churn.
  • In environmental science, they can help assess risks and impacts.


The dataset we’ll be working on within this article is the Iris Dataset. This dataset is a classic in the field of machine learning, partly because of its simplicity, but also because of its small size and inherent variability, making it a perfect candidate to explain and experiment with different machine learning algorithms.

The Iris dataset consists of 150 samples from each of three species of Iris flowers (Iris Setosa, Iris Virginica, and Iris Versicolor). Four features were measured from each sample: the lengths and the widths of the sepals and petals.

This dataset is interesting for our Decision Tree Classifier because the task is to predict the species of the flower based on these four features. It’s a simple, yet challenging task that beautifully showcases the power of Decision Trees.


Let’s dive into the implementation. We’ll go step-by-step from importing the required libraries to evaluating the performance of our Decision Tree Classifier. Let’s start coding:

# Step 1: Import Required Libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn import tree
import matplotlib.pyplot as plt

# Step 2: Load the Dataset
iris = datasets.load_iris()

# Step 3: Prepare the Dataset
X =
y =

# Let's split our data into a training set and a testing set.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) 

# Step 4: Training the Model
clf = DecisionTreeClassifier()
clf =, y_train)

# Step 5: Making Predictions
y_pred = clf.predict(X_test)

# Step 6: Evaluating the Predictions
conf_mat = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Step 7: Visualizing the Decision Tree
tree.plot_tree(clf, filled=True)


Here is what our Decision tree looks like:


Cracking the mystery of our decision tree’s performance involves two key pieces of evidence: the confusion matrix and the classification report. But don’t worry, understanding them is as easy as pie, once we break it down.

Let’s start with the confusion matrix:

Confusion Matrix:

Think of this matrix as a report card for our decision tree, showing how many times it got the answers right or wrong. The columns represent the predictions made by our model and the rows represent the actual values. There are three categories (0, 1, 2), which is why our matrix is a 3×3 grid.

The diagonal line going from the top left to bottom right (14, 17, and 12) is the number of correct predictions our model made. So, our model predicted category 0 correctly 14 times, category 1 correctly 17 times, and category 2 correctly 12 times. All the other places in the matrix represent the times our model made an incorrect prediction. In our case, there is only one misclassification each for categories 1 and 2.

Now, onto the classification report:

Classification Report:

This report provides a deeper dive into our model’s performance. Here’s what it all means:

  1. Precision: This tells us how often our model is correct when it makes a prediction. For example, when our model predicts category 0, it’s right 100% of the time! For Category 1, it’s correct 94% of the time, and for Category 2, it’s correct 92% of the time.
  2. Recall: This measures how well our model is able to find all the relevant cases within a category. So, our model correctly identifies all instances of Category 0, 94% of Category 1, and 92% of Category 2.
  3. F1-score: This is like a report card summary. It’s the harmonic mean of precision and recall, giving us a single number to evaluate our model. Closer to 1 is better, and our model does pretty well, with a score of 1.00, 0.94, and 0.92 for categories 0, 1, and 2 respectively.
  4. Support: This just tells us how many instances there are of each category in our dataset.

The overall accuracy of our model, represented at the bottom, is 0.96, meaning it’s correct 96% of the time, which is pretty awesome!


We’ve seen that our decision tree does a pretty impressive job at classifying our dataset. But how does it stack up against other classification algorithms like Logistic Regression, Naive Bayes, or KNN Classifier? Well, let’s find out!

One major advantage of decision trees is their simplicity and interpretability. When you look at a decision tree, it’s pretty clear to see how a decision is made. It’s like following a map to a treasure! Other algorithms, like logistic regression or Naive Bayes, work more like a mysterious black box. They can also make great predictions, but understanding exactly why they made a decision can be tough.

However, decision trees are not always the winner. They can sometimes overthink things, going back and forth and getting caught up in the tiny details. This is known as overfitting, where our model is too specific to our training data, and it struggles to make good predictions on new, unseen data. On the other hand, algorithms like Naive Bayes, KNN Classifier, or Logistic Regression might handle this better because they take a more general view.

Lastly, decision trees can get really complex when there are lots of categories or features, leading to a big, confusing tree. In such cases, simpler models like logistic regression or even Naive Bayes can be more useful.

Remember, there’s no one-size-fits-all in the world of machine learning. Sometimes the decision tree will be your go-to algorithm, other times it might be a different one. The trick is understanding the strengths and weaknesses of each, and choosing the best tool for your specific task!


Just like a coin has two sides, decision trees also come with their own set of advantages and limitations. Let’s explore these ones by one.

Advantages of Using Decision Trees

  • Simplicity: Decision trees are like a game of 20 questions. They ask binary (yes/no) questions about the data until they arrive at a prediction. It’s as simple as choosing between two paths at each step!
  • Interpretability: Decision trees are highly interpretable and can be visualized easily, making them perfect for presentations. They don’t require any fancy math to understand. You can see exactly how the model makes decisions, just by following the branches and the leaves of the tree.
  • Minimal Data Preparation: Unlike many other algorithms that require data normalization or dummy variables, decision trees can handle both numerical and categorical data, and they are not influenced by outliers. This saves lots of time and makes them very handy.
  • Non-Parametric: Decision trees are non-parametric, which means they make no assumptions about the underlying data distribution. This is an advantage over other methods that have strict assumptions.

Limitations of Using Decision Trees

  • Overfitting: This is the most common problem. Trees can grow very complex and fit the noise in the data rather than the signal, i.e., they learn from the specifics of the training data, which can be detrimental to their performance on unseen data. This is like memorizing the answers for a test instead of understanding the underlying concept.
  • Instability: Decision trees can be unstable, meaning even small changes in data can result in a completely different tree. This is like being highly influenced by rumors — if a rumor changes slightly, your whole opinion might change.
  • Biased Learning: Decision trees are biased towards features with more levels. In other words, they are more likely to choose features with a large number of distinct values or categories for splitting.
  • Problem with Diagonal Decision Boundary: Decision trees struggle with creating splits for problems where the decision boundary is diagonal. They prefer to split perpendicular to a feature axis, which results in additional splits to model a diagonal decision boundary.


Well, there you have it! We’ve journeyed together through the fascinating world of decision trees, discovering how they make decisions, exploring their structure, and understanding their strengths and weaknesses. Just like the many branches of a tree, our understanding of decision trees has branched out, providing us with a comprehensive knowledge of this powerful tool.

As with any tool, the key lies in understanding when and how to use it effectively. Decision trees offer simplicity and interpretability, making them a great first-choice algorithm for classification problems. But, remember to be mindful of their limitations, particularly their tendency to overfit and their sensitivity to minor changes in data.

In the end, our machine-learning adventure is all about understanding these different tools and learning how to combine them in the most effective way. So, let’s keep this learning spirit high as we saddle up for our next journey where we will explore the ensemble methods starting with Random Forests, which build upon the simple decision tree to create a more robust and accurate model. So stay tuned and keep exploring!

Don’t forget, learning is like growing a tree. It might be slow, but with time and patience, it will branch out beautifully, providing shade and fruits of knowledge. Keep growing, and keep learning!

QUIZ: Test Your Knowledge!

Share the Post:
Learn Data Science. Courses starting at $12.99.

Related Posts

© Let’s Data Science


Unlock AI & Data Science treasures. Log in!