gpt4 book ai didi

r - 转换 Dataframe 以在 ggplot2 中制作瀑布图

转载 作者:行者123 更新时间:2023-12-04 01:46:55 26 4
gpt4 key购买 nike

我想将我的数据框转换为适合瀑布图的格式。

我的数据框如下:

employee <- c('A','B','C','D','E','F', 
'A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F')
revenue <- c(10, 20, 30, 40, 10, 40,
8, 10, 20, 50, 20, 10,
2, 5, 70, 30, 10, 50,
40, 8, 30, 40, 10, 40)
date <- as.Date(c('2017-03-01','2017-03-01','2017-03-01',
'2017-03-01','2017-03-01','2017-03-01',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-04','2017-03-04','2017-03-04',
'2017-03-04','2017-03-04','2017-03-04'))
df <- data.frame(date,employee,revenue)

date employee revenue
1 2017-03-01 A 10
2 2017-03-01 B 20
3 2017-03-01 C 30
4 2017-03-01 D 40
5 2017-03-01 E 10
6 2017-03-01 F 40
7 2017-03-02 A 8
8 2017-03-02 B 10
9 2017-03-02 C 20
10 2017-03-02 D 50
11 2017-03-02 E 20
12 2017-03-02 F 10
13 2017-03-03 A 2
14 2017-03-03 B 5
15 2017-03-03 C 70
16 2017-03-03 D 30
17 2017-03-03 E 10
18 2017-03-03 F 50
19 2017-03-04 A 40
20 2017-03-04 B 8
21 2017-03-04 C 30
22 2017-03-04 D 40
23 2017-03-04 E 10
24 2017-03-04 F 40

如何转换此数据框,以便在 ggplot2 中将其转换为瀑布图的形式?

amount 列是与员工总天数的差值。

end 列是 start 列减去 amount 列。

start 列是前一天的Total 结束值。

最终的数据框应该是这样的:

         date employee     start    end    amount    total_for_day
1 2017-03-01 A 0 10 10 10
2 2017-03-01 B 0 20 20 20
3 2017-03-01 C 0 30 30 30
4 2017-03-01 D 0 40 40 40
5 2017-03-01 E 0 10 10 10
6 2017-03-01 F 0 40 40 40
7 2017-03-01 Total 0 150 150 150
8 2017-03-02 A 150 148 -2 8
9 2017-03-02 B 150 140 -10 10
10 2017-03-02 C 150 140 -10 20
11 2017-03-02 D 150 160 10 50
12 2017-03-02 E 150 160 10 20
13 2017-03-02 F 150 120 -30 10
14 2017-03-02 Total 150 118 -32 98
15 2017-03-03 A 118 112 -6 2
16 2017-03-03 B 118 113 -5 5
17 2017-03-03 C 118 168 50 70
18 2017-03-03 D 118 98 -20 30
19 2017-03-03 E 118 108 -10 10
20 2017-03-03 F 118 158 40 50
21 2017-03-03 Total 118 167 49 170
22 2017-03-04 A 167 205 38 40
23 2017-03-04 B 167 170 3 8
24 2017-03-04 C 167 127 -40 30
25 2017-03-04 D 167 177 10 40
26 2017-03-04 E 167 167 0 10
27 2017-03-04 F 167 157 -10 40
28 2017-03-04 Total 167 168 1 168

最佳答案

有几个步骤可以让你做到这一点,我认为 dplyr 包会有所帮助(在下面大量使用)。

我的理解是revenue给出的是累计总收入,而不是每天的变化。如果那是错误的,您将需要反转其中的一些计算。

第一步是创建一个新的 data.frame 来计算每日总计,然后将其绑定(bind)回 data.frame。然后,您可以group_by 员工(包括“总计”)并添加将为每个员工单独创建的列(前一天的值、变化,然后是增加还是增加)减少)。

toPlot <-
bind_rows(
df
, df %>%
group_by(date) %>%
summarise(revenue = sum(revenue)) %>%
mutate(employee = "Total")
) %>%
group_by(employee) %>%
mutate(
previousDay = lag(revenue, default = 0)
, change = revenue - previousDay
, direction = ifelse(change > 0
, "Positive"
, "Negative"))

返回:

         date employee revenue previousDay change direction
<date> <chr> <dbl> <dbl> <dbl> <chr>
1 2017-03-01 A 10 0 10 Positive
2 2017-03-01 B 20 0 20 Positive
3 2017-03-01 C 30 0 30 Positive
4 2017-03-01 D 40 0 40 Positive
5 2017-03-01 E 10 0 10 Positive
6 2017-03-01 F 40 0 40 Positive
7 2017-03-02 A 8 10 -2 Negative
8 2017-03-02 B 10 20 -10 Negative
9 2017-03-02 C 20 30 -10 Negative
10 2017-03-02 D 50 40 10 Positive
# ... with 18 more rows

然后,我们可以使用:

toPlot %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee
, scale = "free_y") +
scale_fill_brewer(palette = "Set1")

给予

enter image description here

请注意,包括“总计”会超出比例(需要自由比例),所以我宁愿忽略它:

toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee) +
scale_fill_brewer(palette = "Set1")

为此允许员工之间的直接比较

enter image description here

这是总计

toPlot %>%
filter(employee == "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
scale_fill_brewer(palette = "Set1")

enter image description here

虽然我仍然发现折线图更容易解释(尤其是比较员工):

toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = revenue
, col = employee)) +
geom_line() +
scale_fill_brewer(palette = "Dark2")

enter image description here

如果你想按天自己绘制变化,你可以这样做:

toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = change
, fill = employee)) +
geom_col(position = "dodge") +
scale_fill_brewer(palette = "Dark2")

得到:

enter image description here

但现在您离“瀑布”图输出很远了。如果您真的非常想制作一个瀑布图,您可以在不同的地 block 之间进行比较,但这会很丑陋(我强烈推荐上面的线图)。

在这里,您需要手动移动方框,如果您更改输出纵横比(或大小)或员工数量,则需要进行一些修改。您还需要包括员工的颜色和变化的方向,这开始看起来很粗糙。这属于“可以,但可能不应该”的范畴——可能有更好的方式来显示这些数据。

toPlot %>%
filter(employee != "Total") %>%
ungroup() %>%
mutate(empNumber = as.numeric(as.factor(employee))) %>%
ggplot(aes(xmin = (empNumber) - 0.4
, xmax = (empNumber) + 0.4
, ymin = previousDay
, ymax = revenue
, col = direction
, fill = employee)) +
geom_rect(size = 1.5) +
facet_grid(~date) +
scale_fill_brewer(palette = "Dark2") +
theme(axis.text.x = element_blank()
, axis.ticks.x = element_blank())

给予

enter image description here

关于r - 转换 Dataframe 以在 ggplot2 中制作瀑布图,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43050698/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com