An Error Model for Random Forest Regression

Sam Carliles, Johns Hopkins University

Random Forest regression is a non-parametric regression technique which performs competitively with other commonly used techniques. Random Forests arrive at regression estimates by training an ensemble of randomized regression trees on bootstrap samples drawn from the training set. We describe a novel interpretation of these individual regression tree estimates which implies asymptotically normally distributed regression errors, and which suggests parameter estimators for the error distributions of each new test object, independent of other test objects. We demonstrate this technique on several data sets, and we offer a theoretical motivation for why this interpretation, in some form, should apply to data of arbitrary underlying distribution.