杯子茶室

关注有趣的事物

數據可視化基礎

网络 0 评

DV Basic

the process of displaying data/information in graphical charts, figures and bars.

DV Elements / Visual Code

  • Color depth: category
  • Length / height; value
  • Position: compare
  • Area: size or value
  • Angle: Proportion
  • Line weight: value
  • Labels: category

Sussessful visualization

  • Truthful
  • Functional
  • Beautiful
  • Insightful
  • enlightening

Innovate insight from DV

  • spontaneous insight, sudden, surprising and unexpected.
  • knowledge-base insight, based on a careful and steady exploration of information that doesn't always result in exciting discoveries.

Data Glyph

  • 基於線條

    • Stars
    • Anderson/metroglyphs
    • Circular profiles
  • 基於人體

    • Stick figures
    • Faces
    • Hedgehogs
  • 基於幾何形狀

    • Profiles
    • Boxes
    • Polygons
  • 基於自然/人工物件

    • Trees
    • Arrows
    • Weathervanes
  • 基於顏色/紋理

    • Autoglyph:方框的顏色
    • Color glyphs:橫跨方框的彩色線條
    • Dashtubes:紋理和不透明度

Misunderstanding / Bias

  • Reasoning

    • Patternicity bug: The tendency to find patterns in random data.
      模式錯誤:在隨機數據中尋找模式的傾向。
    • Storytelling bug: The inclination to create stories or explanations for data that may not have a basis in reality.
      敘事錯誤:傾向於為數據創造故事或解釋,這些故事或解釋可能沒有現實基礎。
    • Confirmation bug (beliefs): The tendency to search for, interpret, and remember information that confirms one's preconceptions.
      確認錯誤(信念):傾向於尋找、解釋和記住確認自己先入之見的信息。
  • Glyphs Biases

    • Perception-based: Biases that arise from the way we perceive visual information.
      基於感知的偏見:由我們感知視覺信息的方式引起的偏見。
    • Proximity-based: Biases that occur due to the spatial arrangement of elements.
      基於接近性的偏見:由元素的空間排列引起的偏見。
    • Grouping-based: Biases that result from the tendency to group similar elements together.
      基於分組的偏見:由將相似元素分組的傾向引起的偏見。

Good Graphic

Complement other related material and fit in.

Parctical considerations of good graph

  • Scale
  • Sorting and Ordering
  • Overlaying Graph
  • Text
  • Size, Ratios
  • Colour
  • Scale
  • Present Data

    • deciding what information you want to cover;
    • Drawing a display suit for the content and for the intended audience.
  • Explore Data

    • find information
    • generate ideas

Elements in designing good graphic

  • Colour
  • Scale
  • Text
  • Overlaying Graph

R Plot Functions

5 Layers of Grammar of graphics(ggplot)

数据 Data

df

美学映射 Aesthetic

aes(x = x, y = y, color=factor(group))

几何对象 Geometric Object

  • geom_point() #散點圖
  • geom_line() #折線圖
  • geom_bar() #條形圖

這些全部都有color參數

统计变换 Statistical transformation

stat_summary(fun = mean, geom = "line")

位置调整 Position adjustment

position_jitter()

e.g.

require(ggplot2)
ggplot(aes(x=x, y=y), data=df) + geom_point() +
  scale_color_manual(values = "blue") +
  labs(title = "Stock trend")

ggplot(data, aes(x = category, y = values)) +
  geom_bar(stat = "identity") +
  labs(title = "條形圖示例", x = "類別", y = "數量")

ggplot(data_box, aes(x = category, y = values)) +
  geom_boxplot(fill = "lightgreen") +
  labs(title = "箱型圖示例", x = "類別", y = "值")
  
ggplot(data, aes(x = x, y = y)) +
  geom_line(color = "blue", size = 1) +  # 設定線的顏色和寬度
  geom_point(color = "red", size = 3) +  # 添加點 可以不添加
  labs(title = "折線圖示例", x = "X 軸", y = "Y 軸")

Plot Function

plot(x, y, #type="", 若不指定則默認scatter
    main="title", 
    xlab="x label",
    ylab="y label")

pie Function

pie(sizes, labels = labels, main = "圓餅圖示例", col = rainbow(length(sizes)))

Bar Function

barplot(heights, names.arg = labels, main = "條形圖示例",
        xlab = "類別", ylab = "數量", col = rainbow(length(heights)))

Dashboard

Visual display of the most important information needed to achieve one or more objectives.
arranged on a single screen -> information can be monitored at one place.

common features

  • Visual display
    視覺顯示

    • As combination of text and graphics
      作為文本和圖形的組合
    • Graphic > text <= graphic greater efficiency and meaning than text
      圖形 > 文本 <= 圖形比文本具有更高的效率和意義
  • Display the information needed to achieve specific objectives
    顯示實現特定目標所需的信息
  • Fit on a single computer screen
    適合在單個計算機屏幕上顯示
  • Monitor information at a glance
    一目了然地監控信息
  • Small, concise, clear, and intuitive display mechanisms
    小巧、簡潔、清晰且直觀的顯示機制
  • Customized
    定制化

Dashboard types

  • Strategic Purpose Dashboard

    • the quick overview for decision makers need to monitor
    • Checking the health of business
  • Analytical Purpose Dashboard:

    • Focus on comparisons,
    • More extensive history
    • Subtler performance evaluators
    • support interactions with the data, such as drilling down into the underlying details
  • Operational purpose Dashboard:

    • Monitor operation
    • Maintain awareness of activities and events,
    • Data should dynamic (data streaming)
    • Should have alert for notice
    • Keep track of awareness

Dashboard Visualization Module

  • python

    • dash
    • plotly express

R^2 Question

  • R^2 > 0.7

    • The datapoint in the scatter plot are not widely distributed;
    • An observable central tendency are showed.
    • Use Linear Regression, lm()
  • R^2 < 0.7

    • The datapoint in the scatter plot are widely distributed;
    • An observable central tendency can not be found.
    • Use Cluster method like kmean, kmean()

Visual Elements

  • Image: visualize multi-column of data
  • Shape: visualize a single column of data
  • Color depth: represent various attitudes
  • Value: visualize the magnitude or quantity of data

Choosing the visual elements for the data visualization

  • If the data can be repersent by single dimension, Use Shape
  • If the data need multi-dimension data to present, Use Image.

LED meaning in figure

  • Available, occupied and Reserved
  • Colour

    • Red: Not allowed
    • Green: Allowed
    • Yellow: Warning

Chart

  • Bar Chart: Used to compare different categories or groups.
    條形圖:用於比較不同的類別或群組。

    - Describes comparisons between different categories or groups.  
        描述不同類別或群組之間的比較。
    - Each bar represents a category, and the height or length of the bar indicates the value.  
        每個條形代表一個類別,條形的高度或長度表示該類別的值。
    - Useful for visualizing categorical data and making comparisons easy to see.  
        有助於可視化分類數據,使比較變得容易。
    
  • Line Chart: Ideal for showing trends over time.
    折線圖:理想用於顯示隨時間變化的趨勢。

    - Describes trends over time.  
        描述隨時間變化的趨勢。
    - Points are plotted on the graph and connected by lines to show changes over periods.  
        在圖表上繪製點並用線連接以顯示隨時間變化的變化。
    - Useful for time series data to identify trends, patterns, and fluctuations.  
        有助於識別時間序列數據中的趨勢、模式和波動。
    
  • Pie Chart: Useful for showing proportions and percentages.
    餅圖:用於顯示比例和百分比。

    - Describes proportions and percentages.  
        描述比例和百分比。
    - Each slice of the pie represents a category's contribution to the whole.  
        餅圖的每個切片代表一個類別對整體的貢獻。
    - Best used for displaying data with a small number of categories.  
        最適合顯示類別數量較少的數據。
    
  • Scatter Plot: Great for showing relationships between two variables.
    散點圖:非常適合顯示兩個變量之間的關係。

    - Describes relationships between two variables.  
        描述兩個變量之間的關係。
    - Each point represents an observation with its position determined by the values of the two variables.  
        每個點代表一個觀察,其位置由兩個變量的值決定。
    - Useful for identifying correlations, clusters, and outliers.  
        有助於識別相關性、聚類和異常值。
    
  • Histogram: Used to show the distribution of a dataset.
    直方圖:用於顯示數據集的分佈。

    - Describes the distribution of a dataset.  
        描述數據集的分佈。
    - Data is grouped into bins, and the height of each bin represents the frequency of data points within that range.  
        數據被分組到箱中,每個箱的高度表示該範圍內數據點的頻率。
    - Useful for understanding the shape, spread, and central tendency of the data.  
        有助於理解數據的形狀、分佈和集中趨勢。
    
  • Box Plot: Useful for displaying the distribution of data based on a five-number summary.
    箱形圖:用於顯示基於五數摘要的數據分佈。

    - Describes the spread, median, and interquartile range (IQR) of the data.  
        描述數據的分佈、中位數和四分位距(IQR)。
    - Identifies outliers and compares different categories.  
        識別異常值並比較不同類別。
    - The box represents the IQR, the line inside the box shows the median, and the "whiskers" extend to the smallest and largest values within 1.5 * IQR from the quartiles.  
        箱體代表IQR,箱內的線顯示中位數,“鬍鬚”延伸到距四分位數1.5 * IQR內的最小值和最大值。
    
  • Heatmap: Ideal for showing data density and variations.
    熱圖:理想用於顯示數據密度和變化。

    - Describes data density and variations.  
        描述數據密度和變化。
    - Uses color to represent values in a matrix, with different colors indicating different ranges of values.  
        使用顏色表示矩陣中的值,不同顏色表示不同範圍的值。
    - Useful for visualizing large datasets and identifying patterns, correlations, and anomalies.  
        有助於可視化大型數據集並識別模式、相關性和異常。
商业智能(4)- ABAP基礎(1)
发表评论
撰写评论