Using meaningful and stable categories to support exploratory web search:
Two formative studies

Bill Kules and Ben Shneiderman

Department of Computer Science, Human-Computer Interaction Laboratory,
and Institute for Advanced Computer Studies
University of Maryland at College Park, College Park, MD 20742
{wmk,ben}@cs.umd.edu

 

Please send correspondence to Bill Kules (wmk@cs.umd.edu)

(301) 891-7271 (Kules)

(301) 405-2680 (Shneiderman)

(301) 405-6707 (fax)

 


ABSTRACT

Categorizing web search results into comprehensible visual displays using meaningful and stable classifications can support user exploration, understanding, and discovery. We report on two formative studies in the domain of U.S. government web search that investigated how searchers use categorized overviews of search results for complex, exploratory search tasks. The first study compared two overview conditions vs. a control (N=18). The overviews were based on the federal government organizational hierarchy. With the overview conditions, participants noticed missing results more often than participants in the control. They also found pages of interest deeper within the results. The overview conditions received significantly higher subjective ratings. The second study compared an overview based on automated clustering vs. the government hierarchy overview (N=12), and the results suggest that domain knowledge and task influenced the preferred overview. The studies lend support to the use of compact overviews based on meaningful and stable categories tightly coupled with ranked result lists.

Keywords

Web search; exploratory search; search user interfaces; categorization; categorized search results; search result visualization; information seeking; information access; information retrieval; graphical user interfaces; human-computer interaction.

1.       Introduction

The World Wide Web creates tantalizing opportunities for learning and research. Every day, teachers, journalists, researchers and ordinary citizens search the web as they attempt to find, organize, understand, and ultimately learn from information on the web. Although search engines generate long lists of relevant results, the lack of effective overviews challenges users who seek to understand these results, especially for exploratory search tasks such as learning about a new topic, which require gaining overviews of and exploring large sets of search results, understanding document context and identifying unusual features (White, Kules, Drucker, & schraefel, 2006). These users struggle with information overload, coping with an overabundance of information that lacks a comprehensible organization.

This is particularly problematic when users undertake exploratory searches to satisfy information needs that are imprecise or evolving or when their domain knowledge is limited. Alternatively, they may have a clear question or goal, but are uncertain how to gather information to satisfy the goal. Incompletely formulated queries yield a plethora of potentially relevant search results, which must be examined and understood. The problem is exacerbated by the frequency of short queries (Spink, Wolfram, Jansen, & Saracevic, 2001). Analysis of search goals suggest that between 20-30% of all web queries may be exploratory in nature (Rose & Levinson, 2004), which motivates study of this type of search.

Categorizing web search results into comprehensible visual displays using meaningful and stable classifications can support user exploration, understanding of large result sets, and discovery. Research prototypes and commercial search engines have incorporated category information, but (as discussed in our Related Work section) there have been few user studies of categorized overviews for exploratory web search, and there is little research explaining whether, why and under what circumstances they are effective. Research is needed to justify the entry and maintenance of category metadata and to guide the design of search engine interfaces.

This paper reports on two formative studies conducted on web search within U.S. government web sites. The purpose of both studies was to illuminate searchers’ use of categorized overviews to explore and understand search results. The research goals motivating these two studies include:

  • Identifying search tasks that benefit from categorized search result overviews
  • Understanding how the visual presentation of the overview affects its utility
  • Understanding how the classification (i.e. the set of categories) used for the overview affect its utility and the user’s search experience

In study 1, we compared three presentations of results categorized into a 2-level government hierarchy. Two overview+detail interfaces (an expandable outliner and a treemap) allowed users to narrow the search results by categories, and a third interface (the control) provided a typical set of results with category information displayed below each result. In study 2 we investigated the affect of alternate classifications, one based on the government organizational hierarchy and the other based on Vivisimo’s automated clustering. The information seeking tasks used in the studies were motivated by our work with government agencies and our understanding of the challenge of finding government information and related publications. In this domain, web sites such as FirstGov (www.firstgov.gov), FedStats (www.fedstats.gov) and other specialized search engines provide some help for searchers. To our knowledge, no search engines currently provide overviews of search results categorized by government agency, even though studies have found that queries for governmental information comprised 1.5%-3.0% of all queries to general web search engines (Jansen, Spink, & Pedersen, 2005; Spink & Jansen, 2004).

These studies were conducted as part of a research program that is identifying design principles and developing prototypes for the visual display of and interaction with categorized search results. The study interfaces (except for Vivisimo) were developed in accord with six emerging principles (Kules & Shneiderman, in process), that draw on the fields of information science, information retrieval, human-computer interaction and information visualization:

·         Provide overviews of large sets of results

·         Organize results by meaningful classifications

·         Tightly couple category labels to results list

·         Arrange text for scanning/skimming

·         Visually encode quantitative attributes on a stable visual substrate

·         Support multiple visual presentations and classifications

The next section briefly discusses previous studies of categorized search results. Sections 3 and 4 describe the two formative studies, and section 5 discusses the study results. The paper concludes with a summary of our contributions and areas for future work.

2.       RELATED WORK

For exploratory searchers, classifications, taxonomies and other knowledge structures support information organization and retrieval, provide semantic roadmaps to fields of knowledge, and improve learning (Soergel, 1999). There is growing use of thesauri on the web to support information retrieval (Shiri & Revie, 2000). Web directories such as Yahoo! (www.yahoo.com) and the Open Directory Project (www.dmoz.org) (DMOZ) catalog a small but important fraction of the Web, providing an overview of general Web content and enabling users to find information by browsing a familiar subject hierarchy. These knowledge structures can be used to categorize search results for presentation. The following sections briefly discuss studies of categorized search results for web and non-web search applications.

2.1.                          Studies of categorized search results for web search

Meaningful and stable categories have been found beneficial for presentation of web search results in the few studies conducted. Grouping search results by a two-level subject classification expedited document retrieval for informational tasks with a single correct answer (Dumais, Cutrell, & Chen, 2001). For question answering tasks, search results augmented with category labels produced the fastest performance and were preferred over results without category labels (Drori & Alon, 2003). The Cha-Cha system organized intranet search results by an automatically generated web site overview. Preliminary evaluations were mixed, but promising, particularly for what users considered “hard-to-find information” (Chen, Hearst, Hong, & Lin, 1999). The WebTOC system provides a table of contents visualization that supports search within a web site, although no evaluation of its search capability have been reported (Nation, Plaisant, Marchionini, & Komlodi, 1997).

Clustering web search results into dynamic categories, in which documents are grouped by similarity measures rather than explicit categorical attributes, has been investigated as an alternative to classification, and has been shown to improve on ranked lists for information retrieval metrics such as precision and recall (Hearst & Pedersen, 1996; Käki, 2005; Marshall, McDonald, Chen, & Chung, 2004; Zamir & Etzioni, 1999; Zeng, He, Chen, Ma, & Ma, 2004) or task completion time (Turetken & Sharda, 2005). Chen, Houston, Sewell, & Schatz (1998) found that recall improved when searchers were allowed to augment their queries with terms from an thesaurus generated via a clustering-based algorithm. A one-level clustered overview was found helpful when the search engine failed to place desirable web pages high in the ranked results, possibly due to imprecise queries (Käki, 2005). The benefits of clustering include domain independence, scalability, and the potential to capture meaningful themes within a set of documents, although results can be highly variable (Hearst, 1999). Generating meaningful groups and effective labels is a recognized problem (Rivadeneira & Bederson, 2003).

2.2.                          Other studies of categorized search results

The Flamenco system (Hearst et al., 2002; Yee, Swearingen, Li, & Hearst, 2003) provided interfaces to specialized collections (art, architecture and tobacco documents), using faceted hierarchies to produce menus of choices for navigational searching. A usability study compared the interface to a keyword-based search interface for an art and architecture database for structured and open-ended, exploratory tasks (Yee et al., 2003). With Flamenco, users were more successful at finding relevant images (for the structured tasks) and reported higher subjective measures (for both the structured and exploratory tasks). The exploratory tasks were evaluated using subjective measures, because there was no (single) correct answer and the goal was not necessarily to optimize a quantitative measure such as task duration. The Dyna-Cat system organized medical search results by a taxonomy of question types (Pratt, Hearst, & Fagan, 1999). In a comparison with clustering and ranked list interfaces, Dyna-Cat helped searchers find more answers to general fact-finding questions within a fixed time. Searchers also felt that they learned more using Dyna-Cat. The SuperBook interface organized search results within a book according to the text’s table of contents, expediting searches without loss of accuracy (Egan et al., 1989). The GRiDL prototype displays search result overviews in a matrix using two hierarchical categories