During the selection operation, the DQI computes the maximum hit set, which is the hit set determined by the extreme values for the selected attribute and the current ranges for the other attributes. Then it partitions the maximum hit set into p buckets, one for each user-specifiable value for the current attribute. This is essentially a linear time counting sort of the maximum hit set. We store the size of each bucket and all left-to-right partial sums of these sizes (an example range slider with bucket information included is given in Figure 3).
Figure 3: A sample range slider with bucket information. This example
shows 10 records distributed on the range slider sorted in ascending
order for that attribute. The arrows from the range slider to the hit
set show the related positions in the sorted maximum hit set that each
discrete pixel position of the range slider maps to where the numbers
in this hit set gives the values of the records for that attribute
represented by the slider. The lower array of boxes represents the
partial sums of counts for a given pixel position of the range slider.
This example contains only 7 hypothetical pixels for simplicity.
In selection, we also compute the information that facilitates
computation of histograms and tight coupling of range sliders, i.e.,
the redisplay of all slider ranges when any range is changed. To
achieve this goal a two dimensional array for each slider is kept
(i.e., size
, shown in Figure 4).
Figure: A table used to keep the histogram and the tight coupling
information. This figure uses the same data and example from
Figure 3. Each box in the square holds a count. The user sets the
range for attribute 1. For a given specified range of attribute 1 we
can find the valid range for attribute 2 by projecting the hit set's
bounding rectangle onto attribute 2. Each row is a prefix sum of
counts from left to right. So if we subtract column j from column
i we can deal with the resulting difference array to find the valid
range for attribute 2. The highest nonzero row (k) and the lowest
nonzero row (m) give the valid range for attribute 2. Note that the
histogram information for attribute 2 is just the difference array.
Thus, any time the range slider is updated, even by a large number of pixels (as might happen if the user moves the mouse extremely fast), we can determine the number of hits in constant time. We can also determine ranges for the other sliders in constant time per slider, and histograms for the other sliders in constant time per point. Changes to the display are determined by scanning the buckets between the older attribute value and the new one. The display is updated in time that is linear in the number of changes.
If the user changes the axis parameters for the starfield display, then the hits are redisplayed (projected on the new pair of coordinates) but none of the internal data structures is changed. The only thing that is updated is the starfield information.
The starfield information is equivalent to a two dimensional ``dirty-pixel'' array where for each starfield pixel we keep a count. This count tells us that whether that pixel position is dirty or not and if it is, how many points overlap at that pixel. Whenever a pixel count reaches zero that pixel color is set to the background color (and visa versa i.e., if it was zero and becomes a positive integer then we plot that pixel on the starfield display using the foreground color). A clever programmer can deal with these counts to give another form of histogram information to the user using different scales of different colors. Also we can immediately find the necessary pixels on the starfield display that must be updated (i.e., only the pixel positions where the counts vary).