Co-Clustering based Classification Algorithm with Latent Semantic Relationship for Cross-Domain Text Classification throughWikipedia
Conventional schemes to document classification need labeled data to build consistent and precise classifiers. On the other hand, labeled data are rarely available, and normally too expensive to obtain. Provided a learning task for which training data are not available, abundant labeled data possibly will exist for a different however related domain. One would like to make use of the related labeled data as auxiliary information to accomplish the classification task in the target domain. In recent times, the paradigm of transfer learning has been introduced to enable efficient learning strategies when auxiliary data obey a different probability distribution. A co-clustering based classification schemes has been proposed earlier to deal with cross-domain text classification. Here, the idea underlying this approach is extended by making the latent semantic relationship between the two domains explicit. This objective is achieved with the use of Wikipedia. Consequently, the pathway that permits propagating labels between the two domains not only captures common words, however also semantic concepts in accordance with the content of documents. Results empirically demonstrates the efficacy of the semantic-based approach to cross-domain classification using a variety of real data.
Volume: 7 | Issue: 2
Issue Date: May , 2017