gpt4 book ai didi

r - 通过不同的公共(public)列组合迁移进出数据

转载 作者:行者123 更新时间:2023-12-05 03:17:11 24 4
gpt4 key购买 nike

我有两个数据集,一个是从其他县流入 A 县的移民,另一个是从 A 县到其他县的移民流出。为了将两个数据集组合为:

期望的输出:

Key         County          State   FIPS    Inflow  Outflow FiscalYear  Year
510012012 Accomack County VA 51001 NA 27 2011 - 2012 2012
160012012 Ada County ID 16001 12 18 2011 - 2012 2012
80012012 Adams County CO 8001 22 39 2011 - 2012 2012
80012011 Adams County CO 8001 42 31 2010 - 2011 2011
450032012 Aiken County SC 45003 NA 21 2011 - 2012 2012
120012012 Alachua County FL 12001 433 NA 2011 - 2012 2012

部分问题是公共(public)列的行数不相等。另一个问题是不同的县可能属于同一个州,如下面的虚拟数据所示。所以,我在想的是通过 concatenating FIPS (每个县唯一)和 Year。这样我就可以将县及其各自的州和其余相关列的值组合在一行中。

我如何才能将两者合并为一个数据集,这样我就不必对每个常见的县和州名称以及 FIPS 和年份进行硬编码?缺失值将是 NA

我的原始迁移流出数据有 517 个观测值,迁移流入有 441 个,因此每个数据集中的县数不同。

示例数据:

    # People moving out of county A to other counties
inflow_df = structure(list(Origin_FIPS = c(12001L, 8001L, 16001L, 12001L,
8001L, 16001L), Origin_StateName = c("FL", "CO", "ID", "FL",
"CO", "ID"), Origin_Place = c("Alachua County", "Adams County",
"Ada County", "Alachua County", "Adams County", "Ada County"),
InIndividuals = c(433L, 30L, 16L, 381L, 42L, 21L), FiscalYear = c("2011 - 2012",
"2011 - 2012", "2011 - 2012", "2010 - 2011", "2010 - 2011",
"2010 - 2011"), Year = c(2012L, 2012L, 2012L, 2011L, 2011L,
2011L), Key = c(120012012L, 80012012L, 160012012L, 120012011L,
80012011L, 160012011L)), class = "data.frame", row.names = c(NA,
-6L))

# People moving in county A from other counties
outflow_df = structure(list(Dest_FIPS = c(51001L, 16001L, 8001L, 8001L, 45003L
), Dest_StateName = c("VA", "ID", "CO", "CO", "SC"), Dest_Place = c("Accomack County",
"Ada County", "Adams County", "Adams County", "Aiken County"),
OutIndividuals = c(27L, 16L, 39L, 31L, 21L), FiscalYear = c("2011 - 2012",
"2011 - 2012", "2011 - 2012", "2010 - 2011", "2011 - 2012"
), Year = c(2012L, 2012L, 2012L, 2011L, 2012L), Key = c(510012012L,
160012012L, 80012012L, 80012011L, 450032012L)), class = "data.frame", row.names = c(NA,
-5L))

最佳答案

也许这有帮助

library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
bind_rows(lst(Inflow_df, Outflow_df), .id = 'datname') %>%
pivot_longer(cols = contains("_"), names_to = ".value",
names_pattern = ".*_([^_]+$)") %>%
mutate(Key = str_c(County, Year), rn = rowid(Key, datname)) %>%
pivot_wider(names_from = datname, values_from = Individuals) %>%
arrange(rn) %>%
select(-rn)

关于r - 通过不同的公共(public)列组合迁移进出数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74238885/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com