

In the first histogram, we can clearly see that all observations of the setosa class have petal length smaller than 2.45 cm. The small triangle with the value at the x-axis is the splitting point. This way, we can see how the classes are segregated by each split. At each node, we can see a stacked histogram of the feature that is used for splitting the observations, colored by class. First of all, let’s take a moment to acknowledge how big of an improvement it is, especially given that the function call is very similar.
#Decision tree visualization python code
The code snippet is pretty much self-explanatory, so we can move on to the outcome. Having seen the old way of plotting the decision trees, let’s jump right into the dtreeviz approach. That is why we will skip it here, but you can find the implementation in the Notebook on GitHub. It is also possible to use the graphviz library for visualizing the decision trees, however, the outcome is very similar, with the same set of elements as the graph above. Much better! Now, we can quite easily interpret the decision tree. We can easily improve that by running the following snippet. But it is not very readable, for example, there are no feature names (only their column indices) or class labels. We start with the easiest approach - using the plot_tree function from scikit-learn. Now that we have a fitted decision tree model and we can proceed to visualize the tree. The only thing that we will “tune” is the maximum depth of the tree - we constraint it to 3, so the trees can still fit in the image and remain readable. Thus, we do not pay any attention to fitting the model or finding a good set of hyperparameters (there are a lot of articles on these topics). In this article, we focus purely on visualizing the decision trees. The next step involves creating the training/test sets and fitting the decision tree classifier to the Iris data set. We will also go over a regression example, but we will load the Boston housing data set for this later on. Then, we load the Iris data set from scikit-learn. In this article, I will first show the “old way” of plotting the decision trees and then introduce the improved approach using dtreeviz.Īs always, we need to start by importing the required libraries. However, there is a nice library called dtreeviz, which brings much more to the table and creates visualizations that are not only prettier but also convey more information about the decision process. Visualizing the decision trees can be really simple using a combination of scikit-learn and matplotlib.
