gpt4 book ai didi

r - 在长数据中创建新的比率指标

转载 作者:行者123 更新时间:2023-12-03 02:48:29 25 4
gpt4 key购买 nike

我有一个很长的数据框

mydf <- data.frame(
+ date=c("2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01", "2016-06-01"),
+ value=c(1,2,3,4,5,1,2,3,4,5),
+ country=c("US", "US", "US", "US", "US", "US", "US", "US", "US", "US"),
+ indicator=c("gdp", "gdp", "gdp", "gdp", "gdp", "population", "population", "population", "population", "population"))

date value country indicator
1 2016-01-01 1 US gdp
2 2016-02-01 2 US gdp
3 2016-03-01 3 US gdp
4 2016-04-01 4 US gdp
5 2016-05-01 5 US gdp
6 2016-02-01 1 US population
7 2016-03-01 2 US population
8 2016-04-01 3 US population
9 2016-05-01 4 US population
10 2016-06-01 5 US population

我想创建来自比率的具体新指标,例如GDP/人口*1000

它看起来像这样,它必须与每个相应指标的正确日期相匹配

mydf <- data.frame(
+ date=c("2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01", "2016-06-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01"),
+ value=c(1,2,3,4,5,1,2,3,4,5,2,1.5,1.33,1.2),
+ country=c("US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US"),
+ indicator=c("gdp", "gdp", "gdp", "gdp", "gdp", "population", "population", "population", "population", "population", "gdp per capita", "gdp per capita", "gdp per capita", "gdp per capita"))

date value country indicator
1 2016-01-01 1.00 US gdp
2 2016-02-01 2.00 US gdp
3 2016-03-01 3.00 US gdp
4 2016-04-01 4.00 US gdp
5 2016-05-01 5.00 US gdp
6 2016-02-01 1.00 US population
7 2016-03-01 2.00 US population
8 2016-04-01 3.00 US population
9 2016-05-01 4.00 US population
10 2016-06-01 5.00 US population
11 2016-02-01 2.00 US gdp per capita
12 2016-03-01 1.50 US gdp per capita
13 2016-04-01 1.33 US gdp per capita
14 2016-05-01 1.20 US gdp per capita

在 R 中是否有一种简单的方法可以做到这一点?

最佳答案

我个人发现 reshape 包更容易使用,并且它会自动处理多个国家/地区/无论您拥有多少类型的标签/数据类型。

library(reshape)
mydf <- data.frame(
date=c("2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01",
"2016-06-01", "2016-02-01", "2016-03-01", "2016-04-01", "2016-05-01","2016-05-01"),
value=c(1,2,3,4,5,1,2,3,4,5,2,1.5,1.33,1.2, 2),
country=c("US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", "US", 'AU'),
indicator=c("gdp", "gdp", "gdp", "gdp", "gdp", "population", "population", "population",
"population", "population", "gdp per capita", "gdp per capita", "gdp per capita", "gdp per capita", 'gdp'))

要获取新指标,首先将数据设置为宽格式,以便相关列彼此相邻。这样您就可以进行简单的按列操作

df_wide = cast(mydf, date+country~indicator, sum)

您希望将国家/地区和日期作为唯一定义行的列(公式左侧),并将不同的指示符作为列(公式右侧)

        date country gdp gdp per capita population
1 2016-01-01 US 1 0.00 0
2 2016-02-01 US 2 2.00 1
3 2016-03-01 US 3 1.50 2
4 2016-04-01 US 4 1.33 3
5 2016-05-01 AU 2 0.00 0
6 2016-05-01 US 5 1.20 4
7 2016-06-01 US 0 0.00 5

现在创建一个新列并将其设置为您想要的任何内容

df_wide['g_p_ratio'] = df_wide['gdp'] / df_wide['population'] 

然后使用 Melt 将其恢复为长格式

df_new = melt(df_wide, id=c('date'))

瞧!

                       date country value      indicator
gdp 2016-01-01 US 1.00 gdp
gdp.1 2016-02-01 US 2.00 gdp
gdp.2 2016-03-01 US 3.00 gdp
gdp.3 2016-04-01 US 4.00 gdp
gdp.4 2016-05-01 AU 2.00 gdp
gdp.5 2016-05-01 US 5.00 gdp
gdp.6 2016-06-01 US 0.00 gdp
gdp.per.capita 2016-01-01 US 0.00 gdp per capita
gdp.per.capita.1 2016-02-01 US 2.00 gdp per capita
gdp.per.capita.2 2016-03-01 US 1.50 gdp per capita
gdp.per.capita.3 2016-04-01 US 1.33 gdp per capita
gdp.per.capita.4 2016-05-01 AU 0.00 gdp per capita
gdp.per.capita.5 2016-05-01 US 1.20 gdp per capita
gdp.per.capita.6 2016-06-01 US 0.00 gdp per capita
population 2016-01-01 US 0.00 population
population.1 2016-02-01 US 1.00 population
population.2 2016-03-01 US 2.00 population
population.3 2016-04-01 US 3.00 population
population.4 2016-05-01 AU 0.00 population
population.5 2016-05-01 US 4.00 population
population.6 2016-06-01 US 5.00 population

您可能需要也可能不需要新的行标签,但您可以解决这个问题

rownames(df_new) <- 1:nrow(df_new)

关于r - 在长数据中创建新的比率指标,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47667022/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com