gpt4 book ai didi

r - 如何将一列分隔为多列(复杂列)

转载 作者:行者123 更新时间:2023-12-04 10:28:00 26 4
gpt4 key购买 nike

我正在尝试根据主题和成绩将“成绩”列分隔为多个列

    grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";")

# Rename the column names

names(grade)<-c("Student_ID","Name","Venue","Grade")

head(grade)

# Separate `Grade` into `subject` variables and coresponding `Grade`columns
library(tidyverse)


df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":")

head(df)

# It still is not separating `subject ` and `grade` independently

# Here is what I want it to look like

new_df<-df[c(1:5),c(1:4)]

new_df<-data.frame(new_df, V2=c(1:5)) # the same for V2,4,5,6,,7 to separate subject and grade

new_df

我正在尝试使用 dplyr 和 stringr,但无法产生我预期的结果

最佳答案

这是一次使用 tidyverse 的尝试package.After 将所有内容转换为字符(即 grade[] <- lapply(grade, as.character) ),我们创建一个自定义函数,返回排序后的 subject:grade对于每个 StudentID .然后我们使用 unnest使其变长,并使用 separate将其分成两列; SubjectGrade .最后我们spread为每个主题获取一列。

library(tidyverse)

#This function could definetely be more elegant or even avoided
# but this is as far as my regex knowledge allows me to go

mysplit <- function(x){
y <- strsplit(x, ':\\s+|\\s+')[[1]]
z <- paste0(y[c(T, F)], ': ', y[c(F, T)])
return(z[order(sub(':.*', '', z))])
}

grade %>%
mutate(Grade = lapply(Grade, mysplit)) %>%
unnest() %>%
separate(Grade, into = c('Subject', 'Grade'), sep = ': ') %>%
spread(Subject, Grade)

它将这样拆分:

...     Biology Chemitry English Geography History Literature Math Physics
... 1 6.00 6.00 <NA> <NA> <NA> 7.50 4.25 6.80
... 2 5.80 6.00 <NA> <NA> <NA> 6.00 5.75 <NA>
... 3 <NA> <NA> <NA> 8.00 4.50 7.75 2.25 <NA>
... 4 <NA> <NA> <NA> 7.25 7.50 7.75 3.25 <NA>
... 5 <NA> <NA> <NA> 7.75 4.50 8.25 1.75 <NA>
... 6 <NA> 6.60 6.78 <NA> <NA> 7.00 8.75 8.40
.
.

为了更好地理解函数,您应该将其分解。说吧x是以下内容:

x
#[1] "Math: 4.25 Literature: 7.50 Physics: 6.80 Chemitry: 6.00 Biology: 6.00"

每隔space拆分一次或 : space得到下面的向量

y <- strsplit(x, ':\\s+|\\s+')[[1]]
y
#[1] "Math" "4.25" "Literature" "7.50" "Physics" "6.80" "Chemitry" "6.00" "Biology" "6.00"

将它粘贴在一起,首先是所有第一个元素(即主题,y[c(TRUE, FALSE)]),然后是所有第二个元素(即成绩y[c(FALSE, TRUE)]),带有:。分隔符

z <- paste0(y[c(T, F)], ': ', y[c(F, T)])
z
#[1] "Math: 4.25" "Literature: 7.50" "Physics: 6.80" "Chemitry: 6.00" "Biology: 6.00"

最后它输出一个排序的(基于单词 sub(':.*', '', z) )向量

z[order(sub(':.*', '', z))]
#[1] "Biology: 6.00" "Chemitry: 6.00" "Literature: 7.50" "Math: 4.25" "Physics: 6.80"

正如@rosscova 指出的那样,字符串不需要排序,这简化了很多(毕竟不需要函数),即

grade %>% 
mutate(Grade = strsplit(Grade, '[0-9]\\s+')) %>%
unnest() %>%
separate(Grade, into = c('Subject', 'Grade'), sep = ': ') %>%
spread(Subject, Grade)

关于r - 如何将一列分隔为多列(复杂列),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44984925/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com