Reduced Robust Random Cut Forest for Out-of-Distribution Detection in Machine Learning Models

  • Minchala Siva Krishna, K. Naveen

Abstract

Most machine learning based regressors extract in-formation from data collected via past observations to make predictions in the future. Consequently, when input to these trained  models  is  data with  significantly  different statistical properties from data used for training, there is no guarantee of accurate prediction. Consequently, using these models on out of distribution input data may result in a completely different predicted outcome from the desired one, which is not only erroneous but can also be hazardous in some cases. Successful deployment of these machine learning models in any system requires a detection system, which should be able to distinguish between out-of-distribution and in-distribution data (i.e. similar to training data). In this paper, we introduce a novel approach for this detection process using Reduced Robust Random Cut Forest (RRRCF) data-structure, which can be used on both small and large datasets. Similarly, to the Robust Random Cut Forest (RRCF), RRRCF is a structured, but reduced representation  of  the  training  data  sub-space  in  form  of cut-trees. Empirical results of this method on both low and high dimensional data showed that inference about data being in/out of training distribution can be made efficiently and the model is easy  to train with no difficult hyper-parameter tuning. The paper   discusses   two   different   use-cases   for   testing   and validating results.

Published
2023-03-27
How to Cite
Minchala Siva Krishna, K. Naveen. (2023). Reduced Robust Random Cut Forest for Out-of-Distribution Detection in Machine Learning Models. Design Engineering, (1), 255 - 266. Retrieved from http://www.thedesignengineering.com/index.php/DE/article/view/9782
Section
Articles