Missing the Forest For the Decision Trees

3 min readJan 26, 2021

Much ado is made about the Random Forest machine learning algorithm. This happens for good reason as Random Forest is an extremely powerful tool in modeling. To fully understand how the Random Forest algorithm works, we must first focus on what a Random Forest is made up of: Decision Trees!

To better understand Decision Trees, first we must learn some terminologies. A root node is the very beginning of the decision tree. This is the reason why we are using the Decision Tree in the first place. We want to break down the root node into subsequent nodes to better understand it. Next, we have an interior node, which has both incoming and outgoing edges. The incoming edge in our very first interior node is the root node, so an incoming edge is what is feeding into the node. This interior node can then break down into further interior nodes or it could feed into a leaf node, which has no outgoing edges. In other words, a leaf node ends that part of the Decision Tree. It will not be broken down any further.

When we split nodes in a decision tree, we want to make sure that the subsequent nodes are as pure and homogeneous as possible. A way to measure this is by using entropy, which is a measure of uncertainty. So an entropy of 0 would mean that we have divided our incoming node into two completely homogeneous nodes. As an example, think of starting with a class of students and dividing them into two nodes: one with girls and the other with boys.. These nodes would have an entropy of 0 as they are completely homogeneous. There are no boys in the girls node and vice versa. In real world practice, nodes are rarely completely pure, so we use entropy as a measure of the quality of our nodes.

As interior nodes are split into further interior nodes or leaf nodes, the information gain of our decision tree must aways be increased. This means that the entropy of a child node should always be smaller than the entropy of a parent node. Otherwise, what was the point of breaking down the parent node in the first place? As a result, you will find that the decision tree algorithm is always trying to maximize the information gain as much as possible.

Conclusion

Next time you use a Random Forest classifier in a model, my hope is this blog post will have better informed you on how the individual Decision Trees that make up that Random Forest classifier work!

Missing the Forest For the Decision Trees

Written by Joe Ramirez