DV Basic
the process of displaying data/information in graphical charts, figures and bars.
DV Elements / Visual Code
- Color depth: category
- Length / height; value
- Position: compare
- Area: size or value
- Angle: Proportion
- Line weight: value
- Labels: category
Sussessful visualization
- Truthful
- Functional
- Beautiful
- Insightful
- enlightening
Innovate insight from DV
- spontaneous insight, sudden, surprising and unexpected.
- knowledge-base insight, based on a careful and steady exploration of information that doesn't always result in exciting discoveries.
Data Glyph
基於線條
- Stars
- Anderson/metroglyphs
- Circular profiles
基於人體
- Stick figures
- Faces
- Hedgehogs
基於幾何形狀
- Profiles
- Boxes
- Polygons
基於自然/人工物件
- Trees
- Arrows
- Weathervanes
基於顏色/紋理
- Autoglyph:方框的顏色
- Color glyphs:橫跨方框的彩色線條
- Dashtubes:紋理和不透明度
Misunderstanding / Bias
Reasoning
- Patternicity bug: The tendency to find patterns in random data.
模式錯誤:在隨機數據中尋找模式的傾向。 - Storytelling bug: The inclination to create stories or explanations for data that may not have a basis in reality.
敘事錯誤:傾向於為數據創造故事或解釋,這些故事或解釋可能沒有現實基礎。 - Confirmation bug (beliefs): The tendency to search for, interpret, and remember information that confirms one's preconceptions.
確認錯誤(信念):傾向於尋找、解釋和記住確認自己先入之見的信息。
- Patternicity bug: The tendency to find patterns in random data.
Glyphs Biases
- Perception-based: Biases that arise from the way we perceive visual information.
基於感知的偏見:由我們感知視覺信息的方式引起的偏見。 - Proximity-based: Biases that occur due to the spatial arrangement of elements.
基於接近性的偏見:由元素的空間排列引起的偏見。 - Grouping-based: Biases that result from the tendency to group similar elements together.
基於分組的偏見:由將相似元素分組的傾向引起的偏見。
- Perception-based: Biases that arise from the way we perceive visual information.
Good Graphic
Complement other related material and fit in.
Parctical considerations of good graph
- Scale
- Sorting and Ordering
- Overlaying Graph
- Text
- Size, Ratios
- Colour
- Scale
Present Data
- deciding what information you want to cover;
- Drawing a display suit for the content and for the intended audience.
Explore Data
- find information
- generate ideas
Elements in designing good graphic
- Colour
- Scale
- Text
- Overlaying Graph
R Plot Functions
5 Layers of Grammar of graphics(ggplot)
数据 Data
df
美学映射 Aesthetic
aes(x = x, y = y, color=factor(group))
几何对象 Geometric Object
- geom_point() #散點圖
- geom_line() #折線圖
- geom_bar() #條形圖
這些全部都有color參數
统计变换 Statistical transformation
stat_summary(fun = mean, geom = "line")
位置调整 Position adjustment
position_jitter()
e.g.
require(ggplot2)
ggplot(aes(x=x, y=y), data=df) + geom_point() +
scale_color_manual(values = "blue") +
labs(title = "Stock trend")
ggplot(data, aes(x = category, y = values)) +
geom_bar(stat = "identity") +
labs(title = "條形圖示例", x = "類別", y = "數量")
ggplot(data_box, aes(x = category, y = values)) +
geom_boxplot(fill = "lightgreen") +
labs(title = "箱型圖示例", x = "類別", y = "值")
ggplot(data, aes(x = x, y = y)) +
geom_line(color = "blue", size = 1) + # 設定線的顏色和寬度
geom_point(color = "red", size = 3) + # 添加點 可以不添加
labs(title = "折線圖示例", x = "X 軸", y = "Y 軸")
Plot Function
plot(x, y, #type="", 若不指定則默認scatter
main="title",
xlab="x label",
ylab="y label")
pie Function
pie(sizes, labels = labels, main = "圓餅圖示例", col = rainbow(length(sizes)))
Bar Function
barplot(heights, names.arg = labels, main = "條形圖示例",
xlab = "類別", ylab = "數量", col = rainbow(length(heights)))
Dashboard
Visual display of the most important information needed to achieve one or more objectives.
arranged on a single screen -> information can be monitored at one place.
common features
Visual display
視覺顯示- As combination of text and graphics
作為文本和圖形的組合 - Graphic > text <= graphic greater efficiency and meaning than text
圖形 > 文本 <= 圖形比文本具有更高的效率和意義
- As combination of text and graphics
- Display the information needed to achieve specific objectives
顯示實現特定目標所需的信息 - Fit on a single computer screen
適合在單個計算機屏幕上顯示 - Monitor information at a glance
一目了然地監控信息 - Small, concise, clear, and intuitive display mechanisms
小巧、簡潔、清晰且直觀的顯示機制 - Customized
定制化
Dashboard types
Strategic Purpose Dashboard
- the quick overview for decision makers need to monitor
- Checking the health of business
Analytical Purpose Dashboard:
- Focus on comparisons,
- More extensive history
- Subtler performance evaluators
- support interactions with the data, such as drilling down into the underlying details
Operational purpose Dashboard:
- Monitor operation
- Maintain awareness of activities and events,
- Data should dynamic (data streaming)
- Should have alert for notice
- Keep track of awareness
Dashboard Visualization Module
python
- dash
- plotly express
R^2 Question
R^2 > 0.7
- The datapoint in the scatter plot are not widely distributed;
- An observable central tendency are showed.
- Use Linear Regression, lm()
R^2 < 0.7
- The datapoint in the scatter plot are widely distributed;
- An observable central tendency can not be found.
- Use Cluster method like kmean, kmean()
Visual Elements
- Image: visualize multi-column of data
- Shape: visualize a single column of data
- Color depth: represent various attitudes
- Value: visualize the magnitude or quantity of data
Choosing the visual elements for the data visualization
- If the data can be repersent by single dimension, Use Shape
- If the data need multi-dimension data to present, Use Image.
LED meaning in figure
- Available, occupied and Reserved
Colour
- Red: Not allowed
- Green: Allowed
- Yellow: Warning
Chart
Bar Chart: Used to compare different categories or groups.
條形圖:用於比較不同的類別或群組。- Describes comparisons between different categories or groups. 描述不同類別或群組之間的比較。 - Each bar represents a category, and the height or length of the bar indicates the value. 每個條形代表一個類別,條形的高度或長度表示該類別的值。 - Useful for visualizing categorical data and making comparisons easy to see. 有助於可視化分類數據,使比較變得容易。
Line Chart: Ideal for showing trends over time.
折線圖:理想用於顯示隨時間變化的趨勢。- Describes trends over time. 描述隨時間變化的趨勢。 - Points are plotted on the graph and connected by lines to show changes over periods. 在圖表上繪製點並用線連接以顯示隨時間變化的變化。 - Useful for time series data to identify trends, patterns, and fluctuations. 有助於識別時間序列數據中的趨勢、模式和波動。
Pie Chart: Useful for showing proportions and percentages.
餅圖:用於顯示比例和百分比。- Describes proportions and percentages. 描述比例和百分比。 - Each slice of the pie represents a category's contribution to the whole. 餅圖的每個切片代表一個類別對整體的貢獻。 - Best used for displaying data with a small number of categories. 最適合顯示類別數量較少的數據。
Scatter Plot: Great for showing relationships between two variables.
散點圖:非常適合顯示兩個變量之間的關係。- Describes relationships between two variables. 描述兩個變量之間的關係。 - Each point represents an observation with its position determined by the values of the two variables. 每個點代表一個觀察,其位置由兩個變量的值決定。 - Useful for identifying correlations, clusters, and outliers. 有助於識別相關性、聚類和異常值。
Histogram: Used to show the distribution of a dataset.
直方圖:用於顯示數據集的分佈。- Describes the distribution of a dataset. 描述數據集的分佈。 - Data is grouped into bins, and the height of each bin represents the frequency of data points within that range. 數據被分組到箱中,每個箱的高度表示該範圍內數據點的頻率。 - Useful for understanding the shape, spread, and central tendency of the data. 有助於理解數據的形狀、分佈和集中趨勢。
Box Plot: Useful for displaying the distribution of data based on a five-number summary.
箱形圖:用於顯示基於五數摘要的數據分佈。- Describes the spread, median, and interquartile range (IQR) of the data. 描述數據的分佈、中位數和四分位距(IQR)。 - Identifies outliers and compares different categories. 識別異常值並比較不同類別。 - The box represents the IQR, the line inside the box shows the median, and the "whiskers" extend to the smallest and largest values within 1.5 * IQR from the quartiles. 箱體代表IQR,箱內的線顯示中位數,“鬍鬚”延伸到距四分位數1.5 * IQR內的最小值和最大值。
Heatmap: Ideal for showing data density and variations.
熱圖:理想用於顯示數據密度和變化。- Describes data density and variations. 描述數據密度和變化。 - Uses color to represent values in a matrix, with different colors indicating different ranges of values. 使用顏色表示矩陣中的值,不同顏色表示不同範圍的值。 - Useful for visualizing large datasets and identifying patterns, correlations, and anomalies. 有助於可視化大型數據集並識別模式、相關性和異常。