Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data.

Sengupta, Debarka; Aich, Indranil; Bandyopadhyay, Sanghamitra

Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data.

Files

jb2015v40n4p721.pdf (689.34 KB)

Date

2015-10

Authors

Sengupta, Debarka

Aich, Indranil

Bandyopadhyay, Sanghamitra

Abstract

Reduction of dimensionality has emerged as a routine process in modelling complex biological systems. A large number of feature selection techniques have been reported in the literature to improve model performance in terms of accuracy and speed. In the present article an unsupervised feature selection technique is proposed, using maximum information compression index as the dissimilarity measure and the well-known density-based cluster identification technique DBSCAN for identifying the largest natural group of dissimilar features. The algorithm is fast and less sensitive to the user-supplied parameters. Moreover, the method automatically determines the required number of features and identifies them. We used the proposed method for reducing dimensionality of a number of benchmark data sets of varying sizes. Its performance was also extensively compared with some other well-known feature selection methods.

Keywords

Clustering, dissimilarity, eigenvalue, feature selection

Citation

Sengupta Debarka, Aich Indranil, Bandyopadhyay Sanghamitra. Feature selection using feature dissimilarity measure and density-based clustering: Application to biological data. Journal of Biosciences. 2015 Oct; 40(4): 721-730.

URI

https://imsear.searo.who.int/handle/123456789/181454

Collections

Journal of Biosciences

Full item page