A Review of Feature Selection Techniques for Clustering High Dimensional Structured Data

Bhagyashri A. Kelkar and Dr.S.F. Rodd


Abstract:

Data that resides in a fixed field within a record or file is called structured data and have a defined schema. Structured data is getting more and more importance in database applications such as molecular biology, image retrieval, XML document retrieval etc. The objects are usually represented as a vector of measurements, or a point in multidimensional feature space. To make sense out of the abundance of available information, various data mining and data analysis tools like classification and clustering are being used. However when clustering or claasification is done with high dimensional data, traditional algorithms fail to perform as they treat all features equally important in deciding the class/cluster memberships of objects. This is due to the fact that, some of the dimensions are irrelevant and can confuse data mining algorithms by hiding clusters in noisy data. Also in some applications, the cluster structure in the dataset is often limited to a subset of features rather than the entire feature set. Hence feature selection has become an important preprocessing task for effective application of data mining techniques in real-world high dimensional data sets.

Keywords: Feature Selection,Clustering.

Volume: 6 | Issue: Special Issue on Advances in Computer Science and Engineering and Workshop on Big Data Analytics Editors: Dr.S.B. Kulkarni, Dr.U.P. Kulkarni, Dr.S.M. Joshi and J.V. Vadavi

Pages: 176-179

Issue Date: October , 2016

DOI: 10.9756/BIJSESC.8270

Full Text

Email

Password

 


This Journal is an Open Access Journal to Facilitate the Research Community