UPDATED | Social Sciences Brown Bag Seminar
Fuzzy Forests is a new machine learning algorithm for ranking variable importance of features in high-dimensional classification and regression problems where there is high correlation among the predictors and p >>n. Fuzzy Forests borrows from the strength of Weighted Gene Co-Expression Network Analysis (WGCNA) to form modules of high correlated features. The resulting clusters are relatively independent from each other. Recursive feature elimination Random Forests is then used to sieve the variables until the user is given the set of k variables that are the most important in terms of prediction of the outcome. Simulations and real-world examples show excellent performance of Fuzzy Forests as well as the added bonus of slightly better prediction than Random Forests. Applications from HIV immunology show important variables in predicting elite control of HIV. These variables selected in silico have been shown to have a biologic basis for elite control and are being validate in vivo with follow up cohorts.