For Information Abundant Visualizations
Jia Li#, Catherine Plaisant, Ben Shneiderman*#
Human-Computer Interaction Laboratory
#Department of Computer Science
*Institute for Systems Research
University of Maryland, College Park, MD 20742
{jiali, plaisant, ben}@cs.umd.edu
Aug 27, 1998
ABSTRACT
Placing numerous data objects and their corresponding
labels in limited screen space is a challenging problem in information
visualization systems. Extending map-oriented techniques, this paper describes
static placement algorithms and develops metrics (such as compactness and
labeling rate) as a basis for comparison among these algorithms. A control
panel facilitates user customization by showing the metrics for alternative
algorithms. Dynamic placement techniques that go beyond map-oriented techniques
demonstrate additional possibilities. User actions can lead to selective
display of data objects and their labels.
KEYWORDS
Timelines, data object placement, label placement, information
visualization, control panel, metrics, visual feedback
1. INTRODUCTION
Mapmakers and now, visualization designers have realized
that designing effective presentations for abundant information is a difficult
task. Part of the problem is the large number of the data objects compared
with the limited screen space. Maximizing the display of data content in
a comprehensible way is a problem that has been addressed by many researchers.
Mapmakers often turn to larger sheets of paper, but information visualization
designers must work within a limited screen space. However, the dynamics
of zooming, panning, and selective display can be powerful techniques.
Data objects are the essence of visualization systems,
and therefore effective layouts are those that present large numbers of
them and reveal semantic relationships among them. Since labels
identify and explain the data objects, placing the labels directly on and
around the data objects presents an integrated information overview. It
frees the users’ eyes from darting back and forth among the scattered elements
on the screen, thus reducing users’ time in the data comprehension process.
We found that label placement is a challenging problem with few practical
and satisfactory solutions, because of the following two issues:
2. RELATED WORK
2.1 Data Object Placement
Maximizing the display of data content in the limited
screen space is one of the research goals in information visualization
systems. Decisions need to be made on what data items to display
and how they are laid out so that "users can see all of the possibilities
and navigate among them" [25]. Two approaches have been widely adopted
to address the data layout issue by focusing on "what" and "how"
respectively [15].
These systems do not have optimal layout strategies of
the result set. Good global layouts may not apply well to localized data.
The best layout algorithm depends on what information users are currently
focused upon [15]. Therefore users should have control over the layout
process so the resulting layout will reflect their current focus [18].
2.2 Label Placement
Label placement has been a fundamental task in the field of cartography and GIS. Over 500 years, cartographers have collected a great deal of knowledge and rules of how to make a high-quality map. Imhof [14] illustrates these rules by giving examples of good and poor labeling. Automatic label placement has been proven mathematically as an NP-hard problem and it remains a research problem after twenty years of development. Research attention has thus shifted towards powerful heuristic methods that may not exhibit guaranteed performance bounds, but work acceptably in practice [7, 28].
ArcView, a commercial GIS mapping system, helps users
analyze data in a spatial context. Its "Find Best Label Placement" combined
with non-overlapping method works well in a non-dense scenario, where it
places as many labels as possible (See Figure 1). However, it requires
extended computing time for even moderate-sized datasets, and labels are
not clearly associated with their data objects. Users are provided with
several labeling options. They can auto-label either all the features or
a selected set of features, change the font size, style, set the location
of labels relative to their features, or allow and not allow overlapping
labels. ArcView does not apply effective techniques for overlapped labels.
It dramatically reduces visibility and overall quality even with small
overlaps.

Figure 1: ArcView 3.0 map display and label control panel
The Hyperbolic browser [17], on the other hand, makes effective use of overlapped labels, as shown in Figure 2. It provides short and long labels and users can change font size easily. But still, the amount of text that the hyperbolic browser displays is a problem. The experimental task conducted to contrast the hyperbolic browser against a conventional 2-D scrolling browser with a horizontal tree layout, was particularly sensitive to this problem because of the length and overlap of URLs, and the ill-structured nature of the WWW hierarchy [17]. It reveals an important yet easily ignored factor of label placement - label content.
Interactive TimeLines [2] illustrates a poor design of
labeling. It reduces the label legibility and at the same time, it leads
to a low information graphics design. Generally, words should follow the
ordinary writing direction from left to right, the so-called "clockwise
direction" or "writing sense" [14].
Figure 2: Hyperbolic Browser Figure 3: Interactive Timelines
Figure 3: Interactive Timelines
Labeling by brushing [6], a direct manipulation technique developed by Cleveland, can selectively label data points that interest users. The labels can remain after the brush is moved away, if the mode is set to be lasting. This technique works nicely until more labels remain on the screen and they start to overlap with each other.
Another dynamic labeling technique, text streaming
is proposed in the Bead exploration system [5], where a sample of labels
is turned on and then a new sample follows. This successive sampling of
labels is helpful in a way that it presents all the details by not cluttering
the screen. However, it suffers stability problem since the changes are
abrupt and users cannot foresee the next move.
3. EXPLORING THE LAYOUT DESIGN SPACE
In this section, we describe algorithms and techniques
we have developed to address the placement problems. We implemented them
in the LifeLines visualization system for medical patient records and we
use this system to demonstrate these generally applicable techniques.
3.1 Data Object Placement
Establishing an underlying structure to organize data objects on the screen is a key step towards effective information visualization systems. LifeLines lays out temporal events horizontally across the time axis (x-axis) in the 2-D space. When it is applied to visualizing patient records, aspects like medical conditions, office visits, hospitalizations or medications are displayed as individual time lines. Line color and thickness illustrate relationships or significance [21]. An empirical study [18] showed that the LifeLines representation leads to faster response time than a textual design for tasks that involves interval comparisons and making inter-categorical connections.
While the starting and ending x-axis values of timelines
are fixed by this structure, the freedom of placing timelines anywhere
in the vertical space leads to a set of layout algorithms that can be designed
to optimize space utilization or reveal more data relationships.
Figure 4a demonstrates the most compact version, i.e.
"slow compact" of LifeLines. All the events are first sorted by their starting
time. For each event, the algorithm searches all the lines from top to
bottom for an available space to fit the event, i.e. the event will not
overlap with other ones. If no space is found, a new line will be created
to place the event. "Quick compact", on the other hand, skips the sorting
step. A default layout simply searches the bottom line for available space.
Figure 4a: Compact layout
Figure 4b: Chronologically-ordered layout
Figure 4c: Event-name ordered layout
| Data Layout Algorithms | Compactness | Grouping | Occlusion |
| Default compact |
|
|
|
| Quick compact |
|
|
|
| Slow compact |
|
|
|
| Chronologically ordered |
|
|
|
| Event-name ordered |
|
|
|
Table 1: Metric values for data layouts
Each one of these layouts provides certain benefits to users, but no single layout can always produce the best result. Compact layouts present a much richer screen when dealing with large records and minimize the need for scrolling. However, the grouping of events horizontally becomes less meaningful. A chronologically-ordered LifeLines helps users to review the events evolving across time. Unfortunately, a sparse data layout is likely to occur and inevitably, requires increased scrolling. An event-name ordered LifeLines groups similar events horizontally and users can gain insight into how many of those events occurred in the past and how frequently. In this case, screen space utilization depends heavily on the data itself.
We believe that research and practice will be advanced if useful criteria and metrics can be defined to compare layout algorithms. We have developed three metrics, compactness, grouping and occlusion to capture how well each layout strategy utilizes the space and reveals data relationships. We describe how to incorporate these metrics into the system in section 4.
It ranges between 0 and 1 and the larger metric value
indicates more compactness. A low compactness of data graphics is not desirable.
It is suggested that the more data be shown within one display, the more
effective and comparative user’s eye can be [27]. However, very high compactness
can make the data graphics more difficult for users to comprehend. Development
of lower and upper bounds of this metrics will add tremendous value in
evaluating the effectiveness of data placement algorithms. Table 1 shows
the metric value of five layout algorithms against the same dataset.
3.2 Label Placement
Label placement is a crucial issue when dealing with large numbers of records. Our early prototype, as shown in Figure 5, illustrates the traditional labeling challenges:
Figure 5: LifeLines with poor labeling
Good name positions aids map reading considerably and
enhance the esthetics of the map [14]. Based on Imhof’s well-known guidelines,
we have defined 4 candidate label positions for LifeLines data items (NE,
NW, SE, SE) and their preference order is listed in Figure 6.
Figure 6: Candidate Label Positions and their perference orders
Figure 7: LifeLines with improved 4-candidate Labeling
Label connectors, shown in Figure 8a, are thus introduced
to link the data objects with labels to clarify the association. However,
it then leads to a more serious "crowing problem" [22] and lower data-to-ink
ratio [27] since more ink in the graphics is now devoted to non-data items.
In order to decrease the ink redundancy, we introduce a reduced label connection
algorithm. Label connector links to the data only if the algorithm determines
that the data object can be associated with more than one label, i.e.,
other labels reside in the labeling boundary of the current data object.
The labeling boundary in LifeLines is defined as follows:

Figure 8a: LifeLines with label connectors
Figure 8b: Lifelines with reduced label connectors
Semantic Labeling
Few system designers have explicitly looked at labeling
techniques that take into account of semantic relationships or patterns
among the data objects. Labels are used to explain the data and thus should
reflect them. Three tactics will be presented here that captures different
data characteristics: importance order, level of details and repetitive
data.
A. Label Saliency
Saliency is a domain-specific measure of the relative importance or prominence of an event, and can refer either to particular events, characteristics of events, or classes of events [19]. For example, in LifeLines, abnormal events might be more significant and therefore, the labeling algorithm should allocate space resources to those data labels first. Appropriate tools should be provided to the users and domain experts to grant the importance order of those events.
B. Label Aggregation
Aggregation rules can be established when hierarchical
data models are available. In LifeLines, events are grouped into aggregates
and aggregates into facets. One of the rules can be defined as follows:
label aggregates when the space does not permit labeling all the detailed
event objects. For instance, as shown in Figure 9, a series of athenolol
and propanolol are aggregated as beta-blockers. A high-level overview of
the data set is presented rather than a partial set of individual data
objects. Aggregation information, even though leaving out details, covers
a complete data set and provides necessary cues for users to drill down
to the details.
|
|
|
|
|
![]() |
A continuous series of events with the same name attribute
can be tagged with a single label, eliminating duplicate texts and at the
same releasing screen resources to other events. However, if that single
label is too far away from some of the events, association will become
vague again. Therefore, we will only discard the label for the event that
already has a similar label residing in its labeling boundary. We applied
this technique to the same medical test data sets and the result is illustrated
in Figure 10. In the square box, notice that the three blood test events
share the same label while the immediately followed event does not.
Figure 10: Label Integration
Metrics
We introduced three metrics to compare these labeling algorithms: "labeling rate", "overlapping rate" and "association degree".
It ranges between 0 and 1 and obviously, the higher the value is, the
more objects are labeled.
It ranges between 0 and 1 and the higher the value is, the more labels
are overlapped.
but, when label connectors are used, the value will be 1. For an object to be clearly associated with its label, its label must not reside in the labeling boundaries of other objects.
In addition to these three, more metrics should be introduced to capture
other important aspects of labeling. However, some of them are difficult
to quantify. One good example is "readability". Think of the scenario where
very small fonts are chosen for labeling. Designers can attain high values
of labeling rate and association degree, but the labels may be useless,
if they are too small to be readable. Also, the readability metric plays
a crucial role in evaluating the semantic labeling algorithm as well. The
metric value, however, is heavily dependent on users’ perception and a
standard way to quantify it is yet to be found.
3.3 Dynamic Placement Techniques
All the placement algorithms we have presented so far, are designed to produce a static data "map" that is highly comprehensible. Exploiting the dynamic and interactive nature of visualization systems opens the door to other useful techniques. For example, moving the mouse over the data object might cause the label to appear, thereby also clarifying the association. More extensive labeling can be "ballooned" out when the user is focused on a complex object. Another approach is to apply labels only to objects that are selected by dynamic queries.
A challenge of dynamic placement techniques is to balance
display stability and best use of screen space. When users start to zoom
in, many objects may fall out of the screen and if we do not allow re-layout,
space will be underutilized (Figure 11a). At the same time, users have
to scroll down to view other data objects. Figure 11b is the result after
applying a re-layout operation, resulting in a more compactness of data
graphics. However, the change of object locations may detract from users’
comprehension of the structure. The instability may be even more distracting
when continuous re-layout is conducted during zooming.
Figure 11a: Zooming without re-layout leaves large blank areas with
only 19 objects.
Figure 11b: Zooming with re-layout allows more data objects. 29 to be visible.
4. CONTROL PANEL COUPLED WITH FEEDBACK
Our control panel was designed to promote users’ capability to tailor the systems based on their preferences, reasoning and goals. Appropriate feedback about the system can help foster user autonomy [11]. Combining these two together in the same user interface provides an integrated, informative and predictable environment to the users in their decision-making process.
As shown in Figure 12a, we incorporate the metrics described in section 3.1 into the LifeLines control panel with data layout options. The metric values are computed dynamically against the current dataset. Armed with these metrics, users may be able make more appropriate decisions for themselves. Similarly, we provide all the options of label placement algorithms described previously [Figure 12b]. Font size can be changed easily through a value slider. Label length can be truncated via a slider as well, to any number of characters within a pre-defined range. Feedback information on metrics is being added for these controls.

Figure 12a: Control panel with data object placement options
Figure 12b: Control panel with label placement options
5. CONCLUSION
Placement of data objects and their corresponding labels plays an important role in supporting information visualization. We have suggested algorithms and techniques to address these issues. Compact layouts have powerful advantages, but ultimately the screen will become too densely filled to be comprehensible. Therefore attribute-based approaches that allow users to selectively display data objects seem necessary.
We have developed metrics and actively used them in our control panel. Providing feedback about alternative placement algorithms or techniques can enable users to make appropriate choices to match their tasks. We believe that further study will lead to new metrics that will capture other important characteristics of the placement problem.
Static techniques for paper-based layouts should be explored, but the opportunities for dynamic techniques seem great. If task-related user actions can influence the placement of data objects and labels, then the right information can be made to appear more often. For example, if users move a cursor on to an X-ray object, then previous X-rays might be highlighted and labeled, thereby inviting physician exploration for comparison purposes. If users move a cursor on to a surgical procedure, the notes of the referring physician and the hospital records might be highlighted and labeled, thereby inviting physician exploration for background understanding. Additional tasks such as saving objects, navigating among a sequence of objects, and reviewing an entire history suggest other opportunities for dynamic techniques [20].
Control panel design to provide user control on the data
object and label placement algorithms and techniques is a rich topic that
deserves wider attention in the information visualization community.
REFERENCES
Shneiderman, B., Designing
the User Interface, Third Edition 1998, Addison Wesley Longman, Inc.,
Reading, MA