In my previous projects (click here for the full list), I decided to keep focusing on classification and explain how to perform one of the many classification algorithms, specifically using the k nearest neighbor (knn classifier) algorithm. In this post, I am going to keep explaining how you can improve at classification by introducing another algorithm called decision tree classifier. Full code available at my repository.
How does a decision tree classifier work?
Before digging into how to create the algorithm and solve a practical problem, I am first going to explain how this algorithm work. Implementing the code without having an idea of the theory behind Machine Learning is non-productive, especially if you are starting to tackle harder problems.
Decision trees are a sequence of conditions that allow us to split the data iteratively (a node after another, essentially) until we can assign each data into a label. New data will simply follow the decision tree and end up in the most suitable category.

The training allows us to set these conditions, and we can also play with hyperparameters that let us design the tree the way we want. The algorithm is much more technical and requires a dedicated post, however, for now, this is all the theory you need to implement a decision tree algorithm, especially because you are not yet going to tune its hyperparameters.
Should I study decision tree classifiers?
One pertinent question before even starting to study this algorithm in depth is: is it worth it? Neural networks and ML are quickly gaining traction, and are able to solve most problems. Of course, you would not want to commit to a technology that is already becoming obsolete.
- Believe it or not, for smaller problems, decision tree classifiers are still in use.
- There is also the fact to consider that standard classification algorithms are part of the curriculum of each Machine Learning engineer.
- Some newer neural networks are based on the classifc ML algorithms. For example, vector-based technology and vector search uses the same algorithm of knn classifiers.
It is difficult to immediately understand the advantage of knowing such algorithms if there are newer versions out there ready to replace them, at least for me, it made sense to study them, now that I am working as an expert in the field.
Structuring the algorithm
The first step for building any algorithm, after having understood the theory clearly, is to outline which are necessary steps for building it. In the case of our decision tree classifier, these are the steps we are going to follow:
- Importing the dataset
- Preprocessing
- Feature and label selection
- Train and test split
- Train the model
- Make a prediction
- Visualize decision tree
1. Importing the dataset
The iris dataset is probably the most used dataset in Machine Learning by all beginners, I will later explain why very clearly. There are two ways I could have downloaded this dataset, one from Kaggle, the other one from an API from one of the libraries (all top ML libraries have sample datasets stored for experimentation purposes). I decided to download it from the sklearn library, but because was in the form of a dictionary (which is not intuitive, especially when you start), I converted it into a CSV and decided to post the code for your curiosity.
***YOU CAN IGNORE THIS CODE, if you are not interested, in the repository, you will find the final CSV, so do not worry about it.
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
final = list()
for a, b in enumerate(iris['target']):
final.append(list(iris['target_names'])[b])
df = pd.DataFrame(iris['data'])
df['label'] = pd.DataFrame(final)
df.index = df.pop('label')
df.columns = iris['feature_names']
df.to_csv('iris_dataset.csv')
You can start from here if you wish. You can directly import the completed CSV, this is how it will look (I decided to add column names):
import pandas as pd
df = pd.read_csv('iris_dataset.csv')
df

2. Preprocessing
The dataset is perfect already, we do not need to preprocess it. During this phase, it is common to analyze the data to better understand it, but I will leave some analysis for further sections, so you can better understand the flow of the project.
3. Feature and label selection
As usual, I will need to separate features from labels: In two sentences, a supervised AI program (such as this one) is structured in this way: we start from tabular data, in our case, the Titanic Dataset. We divide the columns into two kinds, the columns we wish to predict (labels) and the ones that will act as predictors (features). Once split, then we can train the model.
If this is your first project and you need some basic AI theory, you can dedicate a few minutes in reading this guide that will make your idea clearer when working on this project.
X = df[list(df.columns[0:4])]
y = df[['label']]

Question: why is the iris dataset perfect for classification?
The iris dataset is used for a very particular reason: all the features are very distinct from each other, and they do not overlap. This means that when the classification algorithm is trained, it won’t be unclear to assign some values to one or more labels. The issue is when distributions are overlapping.
We can use a violin graph to look at the distributions one close to each other.
import plotly.graph_objects as go
import pandas as pd
import plotly.express as px
fig = go.Figure()
for feature in list(df.columns[0:4]):
fig.add_trace(go.Violin(y=df[feature],
name=feature,
box_visible=True,
meanline_visible=True))
fig.show()

Another way to analyze features is to use the parallel coordinates graph, which can allow us to visualize every sample of the dataset by using parallel values on lines.
import plotly.express as px
df_iris = px.data.iris()
fig = px.parallel_coordinates(df_iris, color="species_id", labels={"species_id": "Species",
"sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
"petal_width": "Petal Width", "petal_length": "Petal Length", },
color_continuous_scale=px.colors.diverging.Tealrose, color_continuous_midpoint=2)
fig.show()

We can notice that the features belonging to each label are so different that we can even make a distinction with the human eye. If we had a new flower with a petal width of 2.5, for example, we can immediately see from the graph that it belongs to species 3.
4. Train and test split
As usual, to measure the accuracy of our classification algorithm I will need to split the dataset into a test and train sample. If there is no validation set, the default splitting proportion is 80:20 for train:test. Of course, you can experiment a bit to see how results vary, I used .24 because I feel more comfortable with 3:1 proportions.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.24)
5. Train the model
It is now time to finally train the model with a decision tree classifier. The sklearn API for this purpose is quite simple, I will just need to fit in the train sets (X_train and y_train). It is important that the dataset is then tested on data it has never seen before. For this purpose we can calculate the accuracy score by using X_test and y_test:
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
\
1.0
Amazing! Our decision tree score has reached a stunning 100% accuracy! Magic? NO, incredibly good data. I chose to work on this dataset on purpose to show you that data really makes the difference when creating a model. In some cases, even with my experience I cannot get more than 30% accuracy, simply because there are some datasets without any pattern.
6. Make a prediction
You do not necessarily need to follow the code after this section, I am extending a bit to provide you with some valuable material. Now that the model is complete, I will show you how to make a prediction and visualize it. X_test is the dataset I will want to predict.
display(X_test[0:10])

Actually, I am going to attach the prediction to the original dataset, so that we can see the match between features and labels again.
#show predicted dataset
pd.concat([X_test.reset_index(drop=True), pd.DataFrame(clf.predict(X_test))], axis=1)

As you can see, I have been using that very complicated line of code to merge the X_test dataset with the predictions, so that in column 0 we can have the predictions made by the decision tree classifier. Note that there is no limit to what we can do, we could even add a column with the original values so that we can compare them.
7. Visualize decision tree
The beauty of using different models is our ability to visualize how each one of these models makes its own decisions. There is the notion that AI is a black box, which is partly true. There is a new set of practices, recently developed, which is called interpretable AI, and focuses on building and providing tools that allow developers to understand why the AIs make some decisions. This is very useful in AI ethics, to avoid the model to discriminate.
from sklearn import tree
from matplotlib import pyplot as plt
tree.plot_tree(clf)
plt.savefig('out.pdf')
plt.show()
In our case, I outputted (in PDF) the logical structure of the decision tree. Essentially, when a new feature requires the tree to make a prediction, it passes through this tree and where it ends up then it finds the most suitable label.

The end?
What’s next -> Naive Bayes Classifier on Spicy Pepper Classifiers
Did you find this guide useful? If you wish to explore more code, you can check the list of projects that will progressively help you learn data science. If you wish to have a general idea of what you are learning, we have prepared a guide that contains the list of the most important concept you will need to learn to become a Machine Learning engineer.
Join our free programming community on discord, learn how to code, and meet other experts
One thought on “Machine Learning for Beginners; Project 4: Decision Tree Classifier”