gpt4 book ai didi

r - 如何汇总几年到几十年的数据并绘制它们?

转载 作者:行者123 更新时间:2023-12-02 18:21:31 27 4
gpt4 key购买 nike

这是我想要重现的图表:

但为此我必须更改年份列,因为在图表上 x 轴以十年为单位。我可以通过什么方式来实现这个目标?

这就是我从网站 ( https://ourworldindata.org/famines ) 中提取数据的方法:

library(rvest)
library(dplyr)
library(tidyr)
library(ggplot2)
col_link <- "https://ourworldindata.org/famines#famines-by-world-region-since-1860"
col_page <- read_html(col_link)
col_table <- col_page %>% html_nodes("table#tablepress-73") %>%
html_table() %>% . [[1]]
data1 <- col_table %>%
select(Year, `Excess Mortality midpoint`)
 Year      `Excess Mortality midpoint`
<chr> <chr>
1 1846–52 1,000,000
2 1860-1 2,000,000
3 1863-67 30,000
4 1866-7 961,043
5 1868 100,000
6 1868-70 1,500,000
7 1870–1871 1,000,000
8 1876–79 750,000
9 1876–79 7,176,346
10 1877–79 11,000,000
# ... with 67 more rows

最佳答案

首先,要将周期转换为十年,您需要为每个周期提取一年,并据此进行计算。从您上面的评论来看,您似乎需要提取每个期间的结束年份。给定数据后,下面使用正则表达式来执行此操作(以及包 dplyrstringr)。

col_table <- col_table %>% 
mutate(Year = case_when(
grepl("^\\d{4}$",Year) ~ Year,
grepl("\\d{4}[–-]\\d{4}",Year) ~ str_sub(Year, start= -4),
grepl("\\d{4}[–-]\\d{2}$",Year) ~ paste0(str_sub(Year,1,2),str_sub(Year,-2)),
grepl("\\d{4}[–-]\\d{1}$",Year) ~ paste0(str_sub(Year,1,3),str_sub(Year,-1))))

这部分代码的作用是检测不同的情况并提取正确的年份。下面是数据集中出现的所有情况的示例以及这部分代码将导致的结果。

  • 1868 -> 1868
  • 1878-1880 -> 1880
  • 1846–52 -> 1852
  • 1860-1 -> 1861

现在我们有了年份,因此下一步是提取十年。为此,我们需要确保“年份”列是数字并应用必要的计算(在此处查看:https://stackoverflow.com/a/48966643/8864619)

  col_table <- col_table %>% 
mutate(Decade = as.numeric(Year) - as.numeric(Year) %% 10)

为了重现该图,我们需要按十年进行分组,并确保“超额死亡率”中点列是数字,以便能够获得每十年的受害者总数。

col_table <- col_table %>% 
mutate(`Excess Mortality midpoint` = as.numeric(gsub(",", "", `Excess Mortality midpoint`))) %>%
group_by(Decade) %>%
summarize(val = sum(`Excess Mortality midpoint`)) %>%
ungroup()

对于绘图本身,使用ggplot2:

ylab <- c(5, 10, 15, 20, 25)
options(scipen=999)
p <- ggplot(data = col_table, aes(x=factor(Decade),y=val)) +
geom_bar(stat = "identity", fill = "navy") +
scale_x_discrete(labels = col_table %>% distinct(Decade) %>% mutate(Decade = paste0(Decade,"s")) %>% pull()) +
geom_text(aes(label=format(val,big.mark=",")), size=2,vjust=-0.3) +
scale_y_continuous(labels = paste(ylab, "millions"),breaks = 10^6 * ylab) +
ggtitle('Famine victims worldwide')+
theme(panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(size = 0.05, linetype = 'solid',
colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank())
p

因此,将所有内容放在一起,以下代码应该为您提供年份的列和相关十年的列,然后应使用它们来创建您想要的绘图:

library(rvest)
library(dplyr)
library(stringr)
library(ggplot2)

col_link <- "https://ourworldindata.org/famines#famines-by-world-region-since-1860"
col_page <- read_html(col_link)
col_table <- col_page %>% html_nodes("table#tablepress-73") %>% html_table() %>% . [[1]]

col_table <- col_table %>%
mutate(Year = case_when(
grepl("^\\d{4}$",Year) ~Year,
grepl("\\d{4}[–-]\\d{4}",Year) ~ str_sub(Year, start= -4),
grepl("\\d{4}[–-]\\d{2}$",Year) ~ paste0(str_sub(Year,1,2),str_sub(Year,-2)),
grepl("\\d{4}[–-]\\d{1}$",Year) ~ paste0(str_sub(Year,1,3),str_sub(Year,-1)))) %>%
mutate(Decade = as.numeric(Year) - as.numeric(Year)%%10) %>%
mutate(`Excess Mortality midpoint` = as.numeric(gsub(",", "", `Excess Mortality midpoint`))) %>%
group_by(Decade) %>%
summarize(val = sum(`Excess Mortality midpoint`)) %>%
ungroup()

ylab <- c(5, 10, 15, 20, 25)
options(scipen=999)
p <- ggplot(data = col_table, aes(x=factor(Decade),y=val)) +
geom_bar(stat = "identity", fill = "navy") +
scale_x_discrete(labels = col_table %>% distinct(Decade) %>% mutate(Decade = paste0(Decade,"s")) %>% pull()) +
geom_text(aes(label=format(val,big.mark=",")), size=2,vjust=-0.3) +
scale_y_continuous(labels = paste(ylab, "millions"),breaks = 10^6 * ylab) +
ggtitle('Famine victims worldwide')+
theme(panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(size = 0.05, linetype = 'solid',
colour = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank())
p

这是重现的情节: enter image description here

关于r - 如何汇总几年到几十年的数据并绘制它们?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70817735/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com