VIII. Dimensionality Reduction

Filter
A straightforward way to reduce the number of elements shown.
- In an interative vis cotext, filtering is often accomplish through dynamic queries
- Filtering can applied to both items and attributes.
- Item Filtering: Eliminate items based on their values respect to specific attribiutes. Fewer items are shown, but the number of attributes shown does not change.
- Attribute Filtering: The goal is to eliminate attributes rather than items; that is, to show the same number of items, but fewer attributes for each item.
- According to what? Any possible funtion that partitions dataset into two sets
- Attribute value bigger/smaller than X
- Noise/Signal
- Pro and Con
- Pro: straightforward and intuitive → to understand and compute
- Con: Out of sight, out of mind ???
<aside>
⚠️
Problems:
- The primary issue with filtering is that it can exclude valuable data, making the analysis more limited. While filtering is intuitive and easy to implement, users may struggle to select meaningful ranges for filtering, especially when dealing with unknown datasets.
- The biggest issue with filtering is that once data is removed (or filtered out), it's "out of sight, out of mind." This means you lose visibility of the data that you might need later, which can affect insights.
→ Solution: A dynamic querying system can be implemented where the visual encoding and interaction are tightly coupled
</aside>
Statistical Aggregation
A group of elements is represented by a new derived element that stands in for the entire group. A very simple attributes is computing an average, the four other basic aggreration are minimum, maximum, count and sum.
- As with filtering, aggregation can be used for both items and attributes.
- Pro and Con
- Pro: inform about whole set
- Con: difficult to avoid losing signal
<aside>
⚠️
Problems:
- Aggregation merges multiple elements into a new derived element, which can result in the loss of signal (important information) during summarization.
- Specific patterns or outliers can be hidden, leading to potential misinterpretations of the data.
→ Solution: Interactive aggregation allows users to adjust the level of aggregation dynamically, helping to visualize data at different granularities while avoiding loss of signal.
</aside>
- Most common aggregation strategies - Statistical Plotting is a way of reducing the amount of data to be mapped onto graphics primitives trying to preserve the important imformation.
- Histograms
- Box Plots
- Violin Plots
Idiom: Histograms - static item aggregation

Idiom |
Histograms |
What |
Data: Table |
Derived table = New Table: Keys are bins, value are counts |
|
Why: Task |
Find distributions |
- Bin size cruicial
- Pattern can change dramarically depending on discretization (cách chia nhóm)
- Opportunity for interaction: contron bin size on the fly (theo thời gian thực)
- Rules of thumb:
-
bins = $\sqrt{n}$
-
bins = $log_2(n)+1$
Idiom: Box Plots - static item aggregation

Idiom |
Histograms |
What |
Data: Table |
Derived table: 5 quantitative attributes mapped |
|
- Median: central line
- Lower and upper quartile: boxes
- Lower upper fences: whiskers
- Outliers beyonce fence cutoffs explicity shown |
| Why: Task | Find distributions |
| Scale | Unlimited number of items |
- Good for normally distributed data
- Bad for non-normal distributions
- Really bad for bimodal or multimodal distributions
Idiom: Violin Plots

Combine the features of box plots and probability density functions, showing both the summary statistics and the shape of the distribution.
Idiom: Density plots
Idiom: Continuous scatterplots

2D density can be represented with the continuous scatterplot idiom, that is a density map represented with color coding derived from a scatterplot.
Clustering
- Classification of items into similar bins, not using pre-existing categories
- Based on similiarity measure = Euclidean distance, Pearson correlation
- Partitioning algorithms
- Divide data into set of bins
-
bins (k) set manually or automatically
- Hierarchical algorithms
- Produce "similarity tree" (dendrograms): cluster hierarchy
- Agglomerative clustering: start w/each node as own cluster, then iteratively merge
- Cluster hierarchy: derived data used with many dynamic aggregation idioms