AppLens and LaunchTile: Two Designs for One-Handed Thumb Use on Small Devices

Amy K. Karlson, Benjamin B. Bederson

Human-Computer Interaction Lab

Computer Science Department

Univ. of Maryland, College Park, MD, 20742

{akk, bederson}@cs.umd.edu

John SanGiovanni

Microsoft Research

One Microsoft Way, Redmond, WA 98052

johnsang@microsoft.com

 

 


ABSTRACT

We have designed two interfaces to support one-handed thumb use for PDAs and cell phones.  Both use Scalable User Interface (ScUI) techniques to support multiple devices with different resolutions and aspect ratios. The designs use variations of zooming interface techniques to provide multiple views of application data:  AppLens uses tabular fisheye to access nine applications, while LaunchTile uses pure zoom to access thirty-six applications. Two sets of thumb gestures represent different philosophies for one-handed interaction. We conducted two studies to evaluate our designs. The first study explored whether users could learn and execute the AppLens gesture set with only minimal training. Participants performed more accurately and efficiently using semantic gestures for directional navigation than abstract gestures for object interaction. A second study gathered user reactions to each interface, as well as comparative preferences.  With minimal exposure to each design, most users favored the tabular fisheye interface.

Author Keywords

One-handed, mobile devices, gestures, notification, Piccolo, thumb navigation, Zoomable User Interfaces (ZUIs).

ACM Classification Keywords

H.5.2. User Interfaces: Input devices and strategies

INTRODUCTION

The current generation of mobile computing hardware features a variety of devices for interaction with the system software. Many of the devices, typically classified as “smartphones”, feature a numeric keypad or miniature thumb keyboard for input together with a ruggedized screen for display output. These devices have the advantage of single-handed interaction, but most smartphones lack a touch-sensitive display, limiting the options for interaction design to simple directional navigation. Another design approach, typically classified as a “Personal Digital Assistant” (PDA) features a touch-sensitive display surface designed primarily to be used with an included stylus. This design offers greater software design flexibility, but many of the small targets designed for a stylus are too small for fingertip actuation, making one-handed use difficult or impossible.

Our goal is to create a new single-handed interaction system for both smartphone and PDA devices. In addition to providing support for varied input hardware, our designs are also informed by Scalable User Interface (ScUI) techniques [2,3]. This design philosophy allows the system to adapt to different screen resolutions, and supports either portrait or landscape device rotation. Such an architecture could allow developers to create individual applications that target a wide variety of screen resolutions. The display size range of this design study was defined to target screens from 2” to 5”, measured diagonally, with resolutions ranging from 176x220 to 800x600. This is accomplished by using the University of Maryland’s Piccolo.NET development toolkit for zooming and scalable user interfaces [5,16].

The secondary goal of this project is to support rich notification from multiple applications. Most current PDA interfaces are designed for focused interaction with a single task or application, with limited consideration or display real estate allocated for notifications (instant messages, SMS/MMS, email, appointment reminders, voicemail) or monitoring of ambient information streams (RSS feeds, web services, stocks, weather, sports scores). In our proposed design, each application has a dynamic information launch tile in the place of a static launch icon. This feature provides the user the ability to glance at the interface and get quick access to current information with little or no device interaction.

The primary focus of this paper addresses the shell and notification area. We approach these goals by describing two designs: AppLens (characterized by zoom+fisheye) and LaunchTile (characterized by zoom+pan). The two approaches employ variations of zooming interface techniques [1] to overview several notification tiles, each roughly the size of a postage stamp. AppLens uses a tabular fisheye approach to provide integrated access to and notification for nine applications.  LaunchTile uses pure zooming within a landscape of thirty-six applications to accomplish the same goals.

Fisheye and pure zoomable techniques both offer promise, but there are no clear guidelines as to when each approach should be used.  So our approach in this work has been to design and build the best interfaces we could with roughly the same functionality, and then compare and contrast the results. In this way, we hope to understand which design direction makes the most sense for this domain and to learn something about the trade-offs between the two approaches.

For device interaction when using a touch-sensitive screen, both designs utilize a basic gestural system for navigation within the notification and application zoom canvas. While this paper does not address single-handed text input techniques, we included a modular InputTile concept in order to support a variety of one-handed input systems, including single- and multi-tap alphanumeric keypad input, as well as miniature thumb keyboards and unistroke input systems executed with a thumb (e.g., Graffiti[6], Quikwriting[15]).

Related Work

Gestures have proven a popular interaction alternative when hardware alone fails to effectively support user tasks, typical of many nontraditional devices, from mobile computers to wall-sized displays [9]. Gestures can be very efficient, combining both command and operand in a single motion, and is space-conserving, reducing the need for software buttons and menus. However, the invisible nature of gestures can make them hard to remember, and recognition errors can negatively impact user satisfaction [13]. Recent research efforts pairing gestures with PDA-sized devices, have emphasized gestures based on changes in device position or orientation [10,19,25]. However, our work more closely resembles the onscreen gestures that have played a prominent role in stylus-based mobile computing. Such efforts can be roughly categorized as application-specific (e.g., Power Browser [7]) or general purpose (e.g., Quikwriting [15]).

Our thumb-as-stylus designs support usage scenarios in which only one hand is available for device operation, such as talking on the phone or carrying a briefcase. Although existing stylus-based gesture systems do not preclude the use of the thumb, we are not aware of any systems that have been specifically designed for the limited precision and extent of the thumb. One handed device interaction has typically focused on text entry techniques, but beyond one-handed text entry on a numeric keypad, most one-handed text entry systems, including those specifically designed for thumb-based entry, require specialized hardware, such as accelerometers [23,27] or customized keyboards [14]. Our one-handed designs target existing hardware platforms and address the general task of system navigation and interaction, in a manner that is both device and application independent. The Jackito PDA [12] supports thumb-only interaction, but presumes two handed use and is not gesture oriented.

The zoom+fisheye approach: AppLens

AppLens provides notification and one-handed access to nine applications, and is strongly motivated by  DateLens, a tabular fisheye calendar [3].  We refer to AppLens as a “shell” application for its role in organizing and managing access to other applications

Generalized Data Access Using Tabular Fisheyes

Spence and Apperley [8] introduced the “bifocal display” as one of the first examples of fisheye distortion applied to computer interfaces. Furnas extended the bifocal display to include cognitive perceptual strategies and introduced a set of analytical functions to automatically compute generalized fisheye effects [8]. Since then, fisheye distortion has been applied with mixed success across a variety of domains, including graphs [22], networks [24], spreadsheets [18], and documents [11,21].

Bederson et al. [3] drew upon prior work in fisheye calendars and tables in developing DateLens, a space-conserving calendar for personal digital assistants (PDAs), which performed favorably for long-term scheduling and planning tasks when compared to a traditional calendar implementation. The strength of DateLens lay partly in the pairing of distortion and scalability, which allowed the details of a single day to be viewed in the context of up to several months of appointment data, and partly in the design of effective representations for the variety of cell sizes and aspect ratios resulting from tabular distortion. One drawback of DateLens, however, was that it required two hands, with many small widgets requiring a stylus to use.  The current designs extend the principles of DateLens to include one-handed thumb access and generalize the design for use across a variety of domains. 

We developed a general framework that provides a grid, tabular layout, and default views for cell contents at a variety of sizes and aspect ratios. We also developed a general API to make it as easy as possible for applications to be built within this framework: developers replace a small number of cell views with representations that are meaningful within the target domain.

AppLens Notification

The AppLens shell has been implemented within our generalized tabular fisheye framework, using a 3x3 grid, and assigning one of nine applications to each cell. The support for tabular layout includes notification at 3 zoom levels: Notification, Context and Full.

Notification distributes the available screen real estate equally among the 9 application tiles (Figure 1a). One tile remains reserved for settings, which can be used to configure the selection of applications which occupy the other 8 notification tiles. Generally, tiles at Notification size display high level static and/or dynamic application-specific notification information.

Context zoom allocates roughly half the available display area to a single focus tile, compressing the remaining tiles according to a tabular fisheye distortion technique [3,18] (Figure 1c). A tile at Context size typically appears much like a fully functional application, but selectively displays features to accommodate display constraints, and is not interactive. Tiles on the periphery of a Context zoom, or peripheral tiles, may be rendered quite differently depending on their position relative to the focus tile, which dictates whether the peripheral tile is a square, a wide-flat rectangle, or a narrow-tall rectangle. To reduce visual overload at Context size, peripheral tiles are displayed at 40% transparency. The contents of distorted peripheral tiles are not themselves distorted, but rather change representation to provide the most meaning in the space available.

The third and final Full zoom level expands a tile to a fully interactive application that occupies 100% of the display (Figure 1b).

Gesture-Based Cursor Navigation

Existing application designs for PDAs are often inappropriate for one-handed use due to their reliance on screen regions that typically cannot be reached while maintaining control of the device (e.g., accessing the Start menu in the upper left-hand corner of a display while holding the device in the right hand), and the use of standard widgets that are too small for reliable thumb use (e.g., radio buttons, checkboxes, and on-screen keyboards). LaunchTile solves this problem by redesigning application access and interaction such that all targets are both large and within thumb reach. In support of more traditional designs, AppLens instead uses an object cursor, depicted as a dynamically-sized rectangular orange border that users move from one on-screen object to another via command gestures issued anywhere on the PDA screen. Although PDAs do not traditionally support cursors as desktops do, our cursor identifies the on-screen object that is the current interaction target. However, cursors are not new to PDA interface design: the WebThumb [28] web browser includes a similar notion of cursor, but which is controlled via directional hardware, and others [26] have explored device tilting to manipulate PDA cursors.

Neither the cursor nor gestures interfere with the most common stylus interactions of tap and tap+hold. Although gestures do overlap stylus drag commands, dragging is rarely used in applications and could be distinguished from gestures by explicitly setting a gesture input mode.

We established a core set of commands that would allow users to navigate applications using only the input cursor. Our command language supports directional navigation (up, down, left, right) as well as two widget interaction commands: one equivalent to a stylus tap (activate), and the other which negates activation (cancel), equivalent to tapping the stylus outside the target widget. We also include the convenience commands forward and backward, equivalent to tab and alt-tab on Windows PCs.

     

                     (a)                                                   (b)

                          (c)                                                (d)

Figure 1. AppLens Zoom Levels: (a) Notification, (b) Full, (c, d) Context.

Command Gestures

We based the gesture set on the limited abilities of and range of motion of thumbs (Figure 2), and we had a goal of creating a gesture language that could be learned with minimal training. After informally experimenting with a variety of gestures, we developed a simple set of gestures with the aim of maximize memorability and robustness of execution (Figure 3). We assigned semantic mappings for the directional commands up, down, left and right. Two gestures defined by pivoting the thumb from bottom to top and top to bottom we assigned to activate and cancel respectively, both to reinforce their opposing relationship as well as for ergonomic ease in issuing common commands.  Finally, we assigned the upper-left to lower-right diagonal to forward due to its relative forward nature, and by similar reasoning, the reverse gesture to backward. Each gesture is uniquely defined by a slope and direction, or vector, which allows gestures to be robust and highly personalizable: users can issue gestures of any length (beyond an activation threshold of 20 pixels) anywhere on the screen surface.

The eight commands corresponding to the gestures can also be activated with a numeric keypad that corresponds  to the location of the endpoint of the gesture: 1-backward, 3-activate, 7-cancel, and 9-forward. Since smartphones have a joystick which can be used for up, down, left and right, the numeric keypad isn’t necessary, but the 2,4,6 and 8 keys can also be used for movement in the corresponding directions.

            

Figure 2. Screen area            Figure 3. The gesture set.
accessible with one hand.

Using Command Gestures within AppLens

AppLens zoom levels can be changed using activate and cancel gesture commands. The activate command gesture propagates the equivalent of a stylus tap to the current input target, hence its effects are target-specific. The activate gesture zooms in, animating the layout first from Notification to Context zoom, and then to Full zoom. Once at Full zoom, the input cursor will transition to the objects within the application, at which point the command gestures will affect the current target, not the application itself. In this way the AppLens architecture is hierarchical. The cancel command acts to negate the effects of the activate command. At Full zoom, the effects of the cancel command depend on the location of the cursor and the state of its target. If the current target is an internal target in an activated state, the cancel command deactivates the target.  If, however, the target is in an inactive state, the cancel command causes the application tile to animate from Full zoom to Context Zoom, and if issued again, to Notification zoom.

The Zoom+Pan Approach: LaunchTile

Our second design, LaunchTile proposes another way to interact with a grid of notification tiles. The primary shell of the LaunchTile design is an interactive zoomspace consisting of 36 notification tiles, divided into 9 zones of 4 tiles each. As a central design element, LaunchTile uses a large blue onscreen button (called Blue) to provide a consistent point of reference for zooming and panning navigation, and to unify the shell and applications with a consistent visual metaphor. Onscreen tiles, menus, and functions maintain a consistent relative position to Blue.

Home Screen

The Home screen of LaunchTile divides the screen area into 4 equally-sized notification tiles (Figure 4a). The Home screen represents the 4 center-most tiles of the 36-tile zoomspace. To see other tiles, the user pans around the space, and the system snaps to each 4-tile group, called a Zone. In this paper, the zones are described with a numerical designator from 1 to 9, starting in the upper left, and going across. This maps to the numbers on a conventional telephone keypad. Zone 5 is the center zone, which defines the home screen, and shows the 4 highest priority notification tiles, as configured by the user.

Panning Techniques

To support different input hardware and various styles of interaction, there are several ways to pan around the zoomspace. If the device has a multidirectional control joystick, the user can push it in the direction of the targeted zone. This design permits the user to quickly glance at the 4 highest priority tiles with no interaction, and the 16 tiles in zones 2, 4, 6, and 8 are a single tap away. As many directional pads do not support diagonal action, the 16 additional tiles in the corner zones 1, 3, 7, and 9 are two taps away. As with AppLens, the zoomspace does not wrap. There are also three other panning techniques. If the device has a  touch-sensitive display, the user can use his or her thumb to drag the canvas around. When doing so, dragging action is “on rails”, permitting the user to drag vertically and horizontally, but not diagonally. When panning, Blue remains stationary, and snaps the view to the center of each zone, represented by an empty circular space in the hub of 4 tiles in the zone. These virtual guides help keep the user from getting caught between zones.

Within each 4-tile zone, we’ve also designed an indicator widget to show the user’s relative location within the zoomspace. The indicator has two components. First, directional arrows show where other zones are. If users only see indicators pointing up and right, they know they are in zone 7. Next to the directional arrows are blue dots that represent all zones not currently in view. The blue dots could also be used to indicate an alert or status change in a neighboring zone, though this feature was not implemented in our prototype. The fourth way to pan around the zoomspace is to press this directional indicator itself. An oversized hit target ensures that the user can easily hit this without using a stylus.

 

 

                           (a)                                               (b)

     

                         (c)                                                    (d)

Figure 4. Three LaunchTile zoom levels: (a, b) Home, (c) World View, (d) Application

Zooming out to the World View

From any one of the nine 4-tile zones, the user also has the option to zoom out to view the entire 36-tile zoomspace (Figure 4c). Since all 36 tiles are visible at once, this view reduces each tile to a small launch icon or miniature visualization. From this view, the user can clearly see the absolute location of each tile. This view, called the World View, is a way to monitor all applications at once. In the World view, the display is divided into a grid of 9 hit targets, each mapping a 4-tile zone. Single-tapping a zone animates into that zone, and displays the zone’s 4 notification tiles.

Zooming In to an Application

From any zone, the user taps any of the 4 notification tiles to launch the corresponding application. The point of view follows a zooming animation further into the zoomspace until the target application fills the entire display (Figure 4d). If the device has only a numeric keypad (no touchscreen), the user simply presses the numeric key in the corner that corresponds to the zone. Pressing 1 launches the upper left tile, 3 launches the upper right, 7 launches the lower left, 9 launches the lower right. This provides quick, single-tap access to each visible tile. This technique is inspired by ZoneZoom by Robbins et al. [20].

As the system zooms, Blue stays in view and retains its function as a central point of reference. Application menu commands are represented as on-screen buttons clustered around Blue, now positioned at the bottom of the display. Each menu button has its appropriate numeric label to further illustrate its mapping to a mobile phone keypad. This allows a smartphone user to access application commands by pressing the corresponding keypad number.

A visual indicator to the left of the zoomspace animates during interaction to reflect the user’s current absolute zoom level within the LaunchTile environment.

Zoom Control

In every screen but Home, the user presses Blue to move deeper into the object hierarchy. A dedicated Back button, both on-screen and in hardware, takes the user back out. In the Home screen, Blue toggles between the two different notification views, Home (4 tiles are visible) and World View (all 36 tiles are visible). Once an application is launched, three dedicated software buttons along the top edge of the screen support inter- and intra-application navigation. A green Home button immediately zooms the view out to Home view. There is also a Back button on the upper right edge of the screen, and another global command key. The placeholder for this function in our prototype is an icon for voice command and control. On a non-touchscreen device, Back and Home commands are executed with dedicated physical buttons, such as those provided on a smartphone device.

Application-Level Interaction

Although the focus of these designs was notification and shell interaction, we continued our interaction philosophy throughout the depth of the zoomspace, into the application level. In the LaunchTile prototype, we attempted to make application-level interaction consistent with navigation among the notification tiles. Within an application, several gestures are supported which are designed specifically for single-handed navigation and item selection. Previously, others have demonstrated that for direct-manipulation interfaces, a grounded tap-drag-select-release technique is more accurate than a tap-to-select [17]. As such, we made all LaunchTile tap-to-select targets at least 35x35 pixels. In those areas where limited display real estate necessitates smaller targets, the central Blue widget becomes a moveable tool glass which can be positioned over the desired object, email, or text. The large thumb-friendly drag target is offset below the selection area, to keep the user’s thumb from occluding the target. Alternatively, the user can drag anywhere around Blue to move the application space. This technique can be used to pan around a map, scroll a list of emails, or to navigate to text outside the view area. Together, these two interaction techniques permit the user to address a large application zoomspace, yet bring focus to a target much smaller than a finger. A non-touchscreen user simply uses the multidirectional joystick to position the selection area on the screen. Keys 2 and 8 are used for page control up & down, and 4 and 6 can be either menu items or horizontal page control, as appropriate.

Once the targeted item is in the toolglass selection area, a tap on Blue (also supported through hardware via the 5 key) replaces the application menus with a context-sensitive menu, now clustered immediately around the selection area. Like the menu items, each key on a numeric keypad executes its function as relative to Blue. A second tap on Blue sends the user to the deepest level of the zoomspace, text input. At this level, a modular text input object called an InputTile appears around the selection area for alphanumeric input. If the user has a mini qwerty keyboard, they can alternatively use this for directly entering text. With this design, a double tap on Blue while text is selected will bypass the context menu and take the user directly into text edit mode.

Implementation

The AppLens and LaunchTile prototypes have been built using the University of Maryland’s PocketPiccolo.NET development toolkit for Zoomable User Interfaces (ZUIs) [5,16]. Although the primary development and test platform has been the HP iPAQ PocketPC running Microsoft Windows Mobile 2003, both run unmodified on the iMate Smartphone II running Windows Mobile Smartphone 2003 (Figures 1d and 4b).

Although the core architecture and gesture recognition for each shell has been implemented, applications have been simulated with images. This has allowed us to put designs that are faithful to the look and feel of each shell in the hands of users for early feedback, but falls short of full interactivity. However, because LaunchTile design principles extend to the applications themselves, we have included email and mapping as two interactive examples within the LaunchTile prototype.

studies

We conducted two studies to inform future research directions. The first study explored whether users could learn and perform the AppLens gesture set with only minimal training. The second was a usability study which gathered user reactions to each shell design, as well as users’ comparative preferences.

Applens Gesture Study

A significant difference between the gesture designs of each shell is that all gestures in LaunchTile are direct manipulation, while all gestures in AppLens are remote-control commands. In LaunchTile, users physically drag objects with their thumbs, be it the zoomspace, a toolglass, a map, or a list of email headers.  In AppLens, gestures only affect the cursor target, rather than the objects over which the gesture is performed. The success of AppLens therefore depends as much on the utility of the tabular design as the ability for users to execute the gesture command set.  Participants were tested on gesture execution performance and efficiency in navigating an information space.

Participants: 20 participants (12 male, 8 female) were recruited from the general population with the only selection criteria being that participants were right-handed.  The median participant age was 30, and while 12 of the participants considered themselves advanced computer users, only 6 regularly used a PDA.

Materials: The study was run on an HP iPAQ measuring 4.5x2.8x0.5 inches with a 3.5 inch screen.  We constructed a software test environment modeled after AppLens: a hierarchical tabular information space, with each level in the hierarchy represented by a 3x3 grid of equally-distributed cells, numbered 1-9 like a smartphone keypad.  Context zoom was eliminated from the test environment, so that activating a cell animated the display to the next level. Cell labels served as navigational landmarks, each labeled with a hierarchical address constructed by prepending the cell’s local logical number (1-9) with its parent’s label, separated by a dot.  We reserved an area at the top of the screen for task instructions, and disabled tapping to restrict input to gestures alone.   Gestures retained their meaning from AppLens: Navigational gestures left, right, up and down controlled the movement of an orange rectangular cursor within a level; activate zoomed in to the next level and cancel zoomed out to the previous level.  Within the context of the test environment, forward and backward were used for intra-cell selection, allowing participants to cycle a highlight either forward or backward through cell label digits. Highlighted digits could then be bolded or “activated” using the activate gesture or un-bolded or “deactivated” using the cancel gesture.  Participants were provided with a reference sheet that described the hierarchical information space and labeling scheme, as well as the eight gestures to be used for navigation and interaction.

Tasks:  Participants performed two types of tasks.  Gesture  tasks involved performing a gesture given the associated command name.  Navigation tasks required participants to navigate to a particular cell in the hierarchy and/or to activate or deactivate a cell label digit.

Measures: Application logs recorded task time, the gestures performed for each task, and whether the task was completed correctly. Participants were instructed to press a hardware button between tasks, which served to record task time and advance to the next task. Due to a bug in our logging software, task time was recorded at second rather than millisecond resolution.  However, since our goal was to identify performance trends rather than comparison to another input method, one second resolution was sufficient. Participants rated their experience using a 9-point Likert scale: (1) frustrating – satisfying, (2) hard to learn – easy to learn, (3) hard to use – easy to use, (4) slow – fast, and (5) boring – fun.

Procedure:  After reading a description of the test environment and gestures, participants performed 16 practice tasks similar in nature to those in the navigation task phase. Participants practiced for 5-10 minutes.

After the practice phase, participants performed gesture tasks: presented with a command name, participants performed the associated gesture and pressed a hardware button to advance to the next task. Command names were presented to participants in random order, four times for each of the eight commands. The gesture reference sheet was placed face down so that the administrator could record the number of times it was looked at.

The second test phase required participants to perform goal-directed tasks within the information space. These included navigation, activation, navigation+activation, and cancellation tasks. Participants then recorded their subjective ratings of the interaction experience.

Results

In the gesture phase of the study, participants correctly performed gestures an average of 87% of the time. However, looking at gesture type, we see that participants correctly performed directional gestures 93% of the time, but had more difficulty with diagonal gestures activate and cancel at 88% and 85% respectively. backward and forward had the worst success rate, at 70% and 64% of the time respectively.  Time to perform gestures followed a similar trend.  On average, participants required 2.4 seconds to perform each gesture: 1.5 – 1.7 seconds for directional gestures, 2.6 – 2.8 seconds for activate and cancel gestures, and 3.6 – 3.7 seconds for backward and forward gestures respectively.  Neither measure correlated with comput