gpt4 book ai didi

r - 整洁的数据,其变量包含许多组/对值

转载 作者:行者123 更新时间:2023-12-04 11:46:32 26 4
gpt4 key购买 nike

我如何使用任何 tidyverse 函数整理此数据以使市场、部门、子部门及其相应数据(即单元格内每个 = 符号的 RHS,例如 0.2934)更有用?有没有一种形式和方法可以将该信息放在单独的行或列中?

这是我的玩具数据:

df <- tibble::tribble(
~var1, ~year, ~Markets, ~Sectors,
"AA", 2015, "A=0.2934;B=0.1483;C=0.5583", "Technology=0.0566;Health Care=0.1396;Financial=0.0925;Consumer Staples=0.0642;C=0.4252;Basic Materials=0.0358",
"BB", 2015, "D=0.8548;E=0.0869;A=0.0529", "Technology=0.1924;Financial=0.3262;Communications=0.0844;Consumer Discretionary=0.1181;Utilities=0.0484",
"CC", 2015, "A=0.4159;C=0.3615;B=0.1522;D=0.0665;F=0.0018;E=0.0022", "Technology=0.0733;Consumer Discretionary=0.0788;Financial=0.1401;Industrials=0.0691;Energy=0.0377;C=0.3598",
"BB", 2019, "C=22.2;G=16.4;H=9.9;I=9.3;J=6.6", "C=23.3;Financials=21.8;Consumer Staples=11.3;Industrials=10.8;Consumer Discretionary=10.1;Information Technology=8.6",
"CC", 2019, "C=23.9;K=12.7;L=12.2;M=11.2;N=9.6;O=7.8", "C=33.4;Financials=25.6;Consumer Discretionary=6.8;Information Technology=6.7;Energy=5.8;Consumer Staples=5.6",
"DD", 2019, "N=82.4;C=13.9;P=1.1;Q=1.0;R=0.5;S=0.3;T=0.3;U=0.1", "Information Technology=19.9;Financials=14.8;C=13.7;Health Care=11.8;Consumer Discretionary=11.7;Industrials=9.1")

我的真实数据有更多这样的变量,每个变量在每个单元格中包含更多的值。

最佳答案

您可以执行以下操作。

首先,用;分隔值。

Markets <- read.csv2(text = df$Markets, header = FALSE, stringsAsFactors = FALSE)
Sectors <- read.csv2(text = df$Sectors, header = FALSE, stringsAsFactors = FALSE)

现在获取等号后面的内容。

tmp <- lapply(Markets, function(x) strsplit(x, "="))
tmp <- lapply(tmp, function(lst)
sapply(lst, function(x) if(length(x) > 1) x[[2]] else NA))
tmp <- lapply(tmp, as.numeric)
Markets <- do.call(rbind, tmp)

tmp <- lapply(Sectors, function(x) strsplit(x, "="))
tmp <- lapply(tmp, function(lst)
sapply(lst, function(x) if(length(x) > 1) x[[2]] else NA))
tmp <- lapply(tmp, as.numeric)
Sectors <- do.call(rbind, tmp)

不需要临时变量tmp

rm(tmp)

并使上面的结果更漂亮。

Markets <- as.data.frame(Markets)
Sectors <- as.data.frame(Sectors)

names(Markets) <- paste("Market", seq_along(Markets), sep = ".")
names(Sectors) <- paste("Sector", seq_along(Sectors), sep = ".")

Markets
Sectors

关于r - 整洁的数据,其变量包含许多组/对值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54840165/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com