News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages
Due to the swift development of text documents, document clustering has turned out to be one of the foremost techniques for accurately organizing large quantity of documents into a small number of significant clusters. On the other hand, there still exist quite a lot of complications for document clustering, like high dimensionality, accuracy, meaningful cluster labels, scalability, overlapping clusters, and extracting semantics from texts. Here, semantic relations between phrases are analyzed and lexical chai is used to characterize semantic relation. Key phrases are subsequently extracted and a semantic link graph is built on the lexical chai. This paper presents the recognition and summarization components of the news summarization (NS) system. In order to test this system, Web news pages with core hints (which are the subject keywords presented by the news authors) are selected from the 163 website (www.163.com). Experimental results show that this method can correctly recognize Web news pages with a rate of better than 96 percent. They also show that the keyword-extraction method considerably outperforms methods based on term frequency and lexical chai. Experiments also conducted on News datasets to evaluate the performance. Results proved that this scheme completely outperforms the influential news document clustering methods with better accuracy. As a result, this approach not only provides more general and meaningful labels for documents, however also efficiently produces overlapping news story clusters.
Keywords: Data Mining, Web Mining, Clustering.
Volume: 7 | Issue: 2
Issue Date: May , 2017