Using meaningful
and stable categories to support exploratory web search:
Two formative studies
|
Bill Kules and Ben Shneiderman Department of Computer Science, Human-Computer
Interaction Laboratory, Please send correspondence to Bill Kules (wmk@cs.umd.edu) (301) 891-7271 (Kules) (301) 405-2680 (Shneiderman) (301) 405-6707 (fax) |
ABSTRACT
Categorizing web search results into
comprehensible visual displays using meaningful and stable classifications can
support user exploration, understanding, and discovery. We report on two formative
studies in the domain of U.S. government web search that investigated how
searchers use categorized overviews of search results for complex, exploratory
search tasks. The first study compared two overview conditions vs. a control
(N=18). The overviews were based on the federal government organizational
hierarchy. With the overview conditions, participants noticed missing results
more often than participants in the control. They also found pages of interest
deeper within the results. The overview conditions received significantly
higher subjective ratings. The second study compared an overview based on automated
clustering vs. the government hierarchy overview (N=12), and the results
suggest that domain knowledge and task influenced the preferred overview. The
studies lend support to the use of compact overviews based on meaningful and
stable categories tightly coupled with ranked result lists.
Keywords
Web search; exploratory search; search user interfaces; categorization; categorized search results; search result visualization; information seeking; information access; information retrieval; graphical user interfaces; human-computer interaction.
The World Wide Web creates tantalizing opportunities for learning and research. Every day, teachers, journalists, researchers and ordinary citizens search the web as they attempt to find, organize, understand, and ultimately learn from information on the web. Although search engines generate long lists of relevant results, the lack of effective overviews challenges users who seek to understand these results, especially for exploratory search tasks such as learning about a new topic, which require gaining overviews of and exploring large sets of search results, understanding document context and identifying unusual features (White, Kules, Drucker, & schraefel, 2006). These users struggle with information overload, coping with an overabundance of information that lacks a comprehensible organization.
This is particularly problematic when users undertake
exploratory searches to satisfy information needs that are imprecise or
evolving or when their domain knowledge is limited. Alternatively, they may
have a clear question or goal, but are uncertain how to gather information to
satisfy the goal. Incompletely formulated queries yield a plethora of
potentially relevant search results, which must be examined and understood. The
problem is exacerbated by the frequency of short queries (Spink, Wolfram, Jansen,
& Saracevic, 2001).
Analysis of search goals suggest that between 20-30% of all web queries may be exploratory
in nature (Rose & Levinson,
2004),
which motivates study of this type of search.
Categorizing web search results into comprehensible visual displays using meaningful and stable classifications can support user exploration, understanding of large result sets, and discovery. Research prototypes and commercial search engines have incorporated category information, but (as discussed in our Related Work section) there have been few user studies of categorized overviews for exploratory web search, and there is little research explaining whether, why and under what circumstances they are effective. Research is needed to justify the entry and maintenance of category metadata and to guide the design of search engine interfaces.
This paper reports on two formative studies conducted on web search within U.S. government web sites. The purpose of both studies was to illuminate searchers’ use of categorized overviews to explore and understand search results. The research goals motivating these two studies include:
In study 1, we compared three presentations of results categorized into a 2-level government hierarchy. Two overview+detail interfaces (an expandable outliner and a treemap) allowed users to narrow the search results by categories, and a third interface (the control) provided a typical set of results with category information displayed below each result. In study 2 we investigated the affect of alternate classifications, one based on the government organizational hierarchy and the other based on Vivisimo’s automated clustering. The information seeking tasks used in the studies were motivated by our work with government agencies and our understanding of the challenge of finding government information and related publications. In this domain, web sites such as FirstGov (www.firstgov.gov), FedStats (www.fedstats.gov) and other specialized search engines provide some help for searchers. To our knowledge, no search engines currently provide overviews of search results categorized by government agency, even though studies have found that queries for governmental information comprised 1.5%-3.0% of all queries to general web search engines (Jansen, Spink, & Pedersen, 2005; Spink & Jansen, 2004).
These studies were conducted as part of a research program that is identifying design principles and developing prototypes for the visual display of and interaction with categorized search results. The study interfaces (except for Vivisimo) were developed in accord with six emerging principles (Kules & Shneiderman, in process), that draw on the fields of information science, information retrieval, human-computer interaction and information visualization:
· Provide overviews of large sets of results
· Organize results by meaningful classifications
· Tightly couple category labels to results list
· Arrange text for scanning/skimming
· Visually encode quantitative attributes on a stable visual substrate
· Support multiple visual presentations and classifications
The next section briefly discusses previous studies of categorized
search results. Sections 3
and 4 describe the
two formative studies, and section 5
discusses the study results. The paper concludes with a summary of our contributions
and areas for future work.
For exploratory searchers, classifications, taxonomies and other knowledge structures support information organization and retrieval, provide semantic roadmaps to fields of knowledge, and improve learning (Soergel, 1999). There is growing use of thesauri on the web to support information retrieval (Shiri & Revie, 2000). Web directories such as Yahoo! (www.yahoo.com) and the Open Directory Project (www.dmoz.org) (DMOZ) catalog a small but important fraction of the Web, providing an overview of general Web content and enabling users to find information by browsing a familiar subject hierarchy. These knowledge structures can be used to categorize search results for presentation. The following sections briefly discuss studies of categorized search results for web and non-web search applications.
Meaningful and stable categories have been found beneficial for presentation of web search results in the few studies conducted. Grouping search results by a two-level subject classification expedited document retrieval for informational tasks with a single correct answer (Dumais, Cutrell, & Chen, 2001). For question answering tasks, search results augmented with category labels produced the fastest performance and were preferred over results without category labels (Drori & Alon, 2003). The Cha-Cha system organized intranet search results by an automatically generated web site overview. Preliminary evaluations were mixed, but promising, particularly for what users considered “hard-to-find information” (Chen, Hearst, Hong, & Lin, 1999). The WebTOC system provides a table of contents visualization that supports search within a web site, although no evaluation of its search capability have been reported (Nation, Plaisant, Marchionini, & Komlodi, 1997).
Clustering web search results into dynamic categories, in which documents are grouped by similarity measures rather than explicit categorical attributes, has been investigated as an alternative to classification, and has been shown to improve on ranked lists for information retrieval metrics such as precision and recall (Hearst & Pedersen, 1996; Käki, 2005; Marshall, McDonald, Chen, & Chung, 2004; Zamir & Etzioni, 1999; Zeng, He, Chen, Ma, & Ma, 2004) or task completion time (Turetken & Sharda, 2005). Chen, Houston, Sewell, & Schatz (1998) found that recall improved when searchers were allowed to augment their queries with terms from an thesaurus generated via a clustering-based algorithm. A one-level clustered overview was found helpful when the search engine failed to place desirable web pages high in the ranked results, possibly due to imprecise queries (Käki, 2005). The benefits of clustering include domain independence, scalability, and the potential to capture meaningful themes within a set of documents, although results can be highly variable (Hearst, 1999). Generating meaningful groups and effective labels is a recognized problem (Rivadeneira & Bederson, 2003).
The Flamenco system (Hearst et al., 2002; Yee, Swearingen, Li, & Hearst, 2003) provided interfaces to specialized collections (art, architecture and tobacco documents), using faceted hierarchies to produce menus of choices for navigational searching. A usability study compared the interface to a keyword-based search interface for an art and architecture database for structured and open-ended, exploratory tasks (Yee et al., 2003). With Flamenco, users were more successful at finding relevant images (for the structured tasks) and reported higher subjective measures (for both the structured and exploratory tasks). The exploratory tasks were evaluated using subjective measures, because there was no (single) correct answer and the goal was not necessarily to optimize a quantitative measure such as task duration. The Dyna-Cat system organized medical search results by a taxonomy of question types (Pratt, Hearst, & Fagan, 1999). In a comparison with clustering and ranked list interfaces, Dyna-Cat helped searchers find more answers to general fact-finding questions within a fixed time. Searchers also felt that they learned more using Dyna-Cat. The SuperBook interface organized search results within a book according to the text’s table of contents, expediting searches without loss of accuracy (Egan et al., 1989). The GRiDL prototype displays search result overviews in a matrix using two hierarchical categories