Title

A Semantics-Based Method for Clustering of Chinese Web Search Results

Document Type

Article

Publication Date

2014

Publication Source

Enterprise Information Systems

Volume

8

Issue

1

Inclusive pages

147-165

DOI

https://doi.org/10.1080/17517575.2013.857793

Publisher

Taylor & Francis

ISBN/ISSN

1751-7583

Abstract

Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets’ semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

Keywords

search engine, Chinese online semantic clustering, vocabulary chain, semantic similarity, Chameleon algorithm

Disciplines

Engineering

This document is currently not available here.

  Contact Author

Share

COinS