In regression tasks, it’s often assumed that decision trees are more robust to outliers than linear regression. See this Quora question for a typical example. I believe this is also mentioned in the book “Introduction to Statistical Learning”, which may be the source of the notion. Predictions from a decision tree are based on the average of the instances at the leaf. This averaging effect should reduce the effect of outliers, or so the argument goes.

However, in a notebook, I demonstrated that decision trees can sometimes react more poorly to outliers. Specifically, an outlier increased the sum of squared errors more in a decision tree thin in a linear regression. Depending on the specifics, a tree can put an outlier on its own leaf, which could lead to some spectacular failures in prediction.

This is not to say that linear regression is always, or even generally better at handling outliers. Rather, it is best not to assume one or the other technique will outperform in all cases. As usual, the specifics of a given application should dictate the techniques used.

### Like this:

Like Loading...

*Related*

## Published by Matthew Theisen

Matthew Theisen is a data scientist working in internet media. Contact: @RealTheisen
View all posts by Matthew Theisen