Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction


High dimensional feature space could potentially hinder the efficiency and performance for machine learning, and high correlations between features may further increase the redundancy and diminish performance of learning algorithms. Domain ontology provides relationships and similarities between concepts in the specific area, and thus can be used to reduce redundancy by clustering concepts and revealing their functionality. In this paper, we study the problem of using high dimensional medication data to predict the probability of 30-Day heart failure readmission. We propose a feature reduction method for high dimensional dataset using a combination of two drug ontologies. By creating a tree structure of the combination, the method uses a greedy strategy to obtain a subset of features, which may have higher correlation with the class label but lower correlation with each other. Experimental results show that our methods improve the performance of heart failure readmission prediction (using only drug data) comparing to existing feature reduction methods without drug domain ontologies.

MIDAS Network Members