gpt4 book ai didi

r - 使用多个变量创建一个变量

转载 作者:行者123 更新时间:2023-12-01 13:12:19 25 4
gpt4 key购买 nike

我正在尝试清理数据集并在名称下创建 3 个变量:Adventure、Action 和 Comedy。原始数据集有 3000 个观察值(导入文件名:dat)。我只展示了一些观察结果

id    Runtime        Genres                                       
37 75 animation, adventure, family, fantasy, musical
1 162 action, adventure, fantasy, sci_fi
95 126 action, fantasy
100 101 comedy, drama, fantasy
82 136 action, adventure, sci-fi
99 117 animation, adventure, comedy, family, sport
91 95 animation, comedy, crime, family

在 R 中导入数据集后,使用以下 R 代码将所有流派分成 5 个:

dat1 <- dat %>% separate (Genres, c("Genres1","Genres2" ,"Genres3" ,"Genres4" ,"Genres5" ), sep=",", extra = "drop", fill = "right")


id Runtime Genres1 Genres2 Genres3 Genres4 Genres5
37 75 animation adventure family fantasy musical
1 162 action adventure fantasy sci_fi
95 126 action fantasy
100 101 comedy drama fantasy
82 136 action adventure sci-fi
99 117 animation adventure comedy family sport
91 95 animation comedy crime family

如何将 Action 、冒险和喜剧的所有类型分别归为 1 个类别?

我尝试使用以下代码:

使用冒险创建了一个空列

dat1 ["adventure"] <- NA

dat1$adventure <- ifelse(dat1$Genres1=="adventure",1,(ifelse(dat1$Genres2=="adventure",1,0)))

建议将代码缩短为

  dat1$adventure <- ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure" | dat1$Genres3=="adventure" | dat1$Genres4=="adventure" ),1, 0)


id Runtime Genres1 Genres2 Genres3 Genres4 Genres5 Adventure
37 75 animation adventure family fantasy musical 0
1 162 action adventure fantasy sci_fi 0
95 126 action fantasy 0
100 101 comedy drama fantasy 0
82 136 action adventure sci-fi 0
99 117 animation adventure comedy family sport 0
91 95 animation comedy crime family 0

代码能够提取 Genres1 的冒险,但为 Genres2 返回零。

我已经重新编辑了这个问题。我尝试了一些建议但不确定如何去做,因为有 3000 次观察。

运行后建议

流派列表,向量的形成并将其分配给 dat2

dat2 <- c( "adventure", "comedy", "action", "drama", "animation", "fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror", "musical","history", "war", "documentary", "biography")

表(因子(dat2))表(因子(dat2))

 action   adventure   animation   biography      comedy documentary          drama 
1 1 1 1 1 1 1
family fantasy history horror musical mystery romance
1 1 1 1 1 1 1
sci-fi thriller war
1 1 1

创建函数

 fun1 <- function("adventure", "comedy", "action", "drama", "animation",
"fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror",
"musical","history", "war", "documentary", "biography")) {
vector_of_cur_genres <- seperate(i, sep = ", ")
result <- table(factor(vector_of_cur_genres, dat2))
return(result)
}

# Results

fun1 <- function("adventure", "comedy", "action", "drama",
"animation", "fantasy", "mystery", "family", "sci-fi", "thriller",
"romance", "horror", "musical","history", "war", "documentary",
"biography")) {
Error: unexpected string constant in "fun1 <- function("adventure""
> vector_of_cur_genres <- separate(i, sep = ", ")
Error: Please supply column name
> result <- table(factor(vector_of_cur_genres, dat2))
Error in factor(vector_of_cur_genres, dat2) :
object 'vector_of_cur_genres' not found
> return(result)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"

mat <- mapply(fun1,dat2$Genres)
Error in match.fun(FUN) : object 'fun1' not found

最佳答案

您可以混合使用表和因子来获得您想要的结果。首先,您要确保所有流派每次都拼写完全相同 ("Adventure"!= "adventure")。然后,您应该创建一个包含所有可能流派的向量 c("Adventure", "Comedy", "Drama", ...")

然后,对于每一行,您调用 table(factor(genres, list_of_possible_genres)),它将返回一个计数表。然后你可以用这样的东西构造一个矩阵

mat <- mapply(
function(i) {
table(factor(separate(i, ...),list_of_possible_genres))
},df$Genres)
#you want to use the original Data.Frame after import

new.df <- cbind(df,mat) #they should both have the same number of rows here

使单独调用中的 ... 与原始函数中的相同。如果您对各个功能或步骤的作用有任何疑问,我可以在评论中解释。

我在 mapply 调用 function (i) ... 中定义了一个函数,这类似于在 Python 中定义 lambda。该函数接受一串流派,并返回一个命名向量,其中包含每种可能流派出现的次数。

编辑:

fun1 <- function(string_of_genres)) {
vector_of_cur_genres <- seperate(i, sep = ", ")
result <- table(factor(vector_of_cur_genres, list_of_possible_genres))
return(result)
}
mat <- mapply(fun1,df$Genres)

关于r - 使用多个变量创建一个变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38591569/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com