Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data.

Loading...
Thumbnail Image
Date
2015-10
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.
Description
Keywords
Clustering, dissimilarity, eigenvalue, feature selection
Citation
Sengupta Debarka, Aich Indranil, Bandyopadhyay Sanghamitra. Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data. Journal of Biosciences. 2015 Oct; 40(4): 721-730.