Download Full Text (5.5 MB)

Faculty Sponsor

Dr. Beomjin Kim


Department of Computer Science


When attempting to conduct research on a given subject, finding relevant sources can prove a notable challenge. Current tools, such as online journal databases using keyword-based search, are often lacking, and provide you with a plethora of trivial, only tangentially related papers. Further, these searches also make it difficult to find papers that cross into different domains. The lack of efficiency from existing search engines may be caused by not properly utilizing the document contents and instead just heavily relying on generally superficial metadata, such as the author name, title and user-generated tags. The main goal of our study is the development of a web-based application to enhance users’ search activities with the assistance of visualization and more effective utilization of document content. We aim to both make finding the highest quality source in a set of documents easier and refining searches to be more targeted to a specific area of interest a trivial task rather than a difficult thought process. This poster presents a two tier visual interface for searching collections of documents. Tier 1 presents the results of a three term search, using color, size and position to show the papers’ attributes with relation to the search. Tier 2 is designed to not only show the contents of the documents, but to aid in search term refinement for the first tier. This is accomplished by showing related terms, related compound terms, and synonyms. Related terms are terms that appear in the same sentence as a given search term, while compound terms are created by search terms and their neighbors in text. Both tiers work on an index of research papers scraped from Google Research. The index is constructed using Lucene, a popular, open source search package. The system has two major contributions. First, this research is one of the first studies to focus on visual analytics on unstructured content. Prior work has focused on either metadata or the content of structured documents such as books. Second, the visualization is focused on representing a large amount of information in a way that leverages user cognition and relieves mental overload. Refining the system’s algorithms and interface are ongoing tasks. Usability testing is planned to evaluate system effectiveness, specifically in achieving the project goals. Following the usability testing, we intend to integrate more data in the visualization. For example, document metadata such as conference quality and year are likely to be given visual elements, in order to provide a more fully featured searching interface. Scoring algorithms and term score cutoffs will also be adjusted to guarantee user satisfaction.


Computer Sciences

Web-based Visual Interfaces Designed for Searching on Collections of Research Papers