gpt4 book ai didi

r - 将一列中的类别字符串提取到一列中,仅使用 tidyverse。在 R

转载 作者:行者123 更新时间:2023-12-04 09:54:51 26 4
gpt4 key购买 nike

我正在尝试获取新列,只有 tidyverse基于在 Comorbitidy_one、Comorbitidity_two、Comorbitidy_3 等中找到的类别(字符串)的库。我打算使用新列进行逻辑回归。因此,以列中的字符串命名的新列应该是二进制的(0 和 1)。 0 不存在,1 存在。例如 Comorbidity_one 有 "Asthma (managed with an inhaler)"然而,它可能会也可能不会出现在下一个中,因此 "Asthma (managed with an inhaler)"成为一个新的列,有这种发病率的患者为 1,没有发病率的患者为 0。但我可能有 ObesityComorbidity_two ,然而,这成为处理肥胖症患者的新专栏。等等。

这是我拥有的表格类型:

test <- structure(
list(
ID = c("1",
"2", "3",
"4", "5",
"6"),
Chills = c("No", "Mild", "No", "Mild", "No", "No"),
Cough = c("No", "Severe", "No", "Mild", "Mild", "No"),
Diarrhoea = c("No", "Mild", "No", "No", "No", "No"),
Fatigue = c("No", "Moderate", "Mild", "Mild", "Mild", "Mild"),
Headcahe = c("No", "No", "No", "Mild", "No", "No"),
`Loss of smell and taste` = c("No", "No", "No", "No", "No", "No"),
`Muscle Ache` = c("No", "Moderate", "No", "Moderate", "Mild", "Mild"),
`Nasal Congestion` = c("No", "No", "No", "No", "Mild", "No"),
`Nausea and Vomiting` = c("No", "No",
"No", "No", "No", "No"),
`Shortness of Breath` = c("No",
"Mild", "No", "No", "No", "Mild"),
`Sore Throat` = c("No",
"No", "No", "No", "Mild", "No"),
Sputum = c("No", "Mild",
"No", "Mild", "Mild", "No"),
Temperature = c("No", "No",
"No", "No", "No", "37.5-38"),
Comorbidity_one = c(
"Asthma (managed with an inhaler)",
"None",
"Obesity",
"High Blood Pressure (hypertension)",
"None",
"None"
),
Comorbidity_two = c("Diabetes Type 2", NA,
NA, "Obesity", NA, NA),
Comorbidity_three = c(
"Asthma (managed with an inhaler)",
"None",
"Obesity",
"High Blood Pressure (hypertension)",
"None",
NA_character_
),
Comorbidity_four = c(
"Asthma (managed with an inhaler)",
"None",
"High Blood Pressure (hypertension)",
NA_character_,
NA_character_,
NA_character_
),
Comorbidity_five = c(
"Asthma (managed with an inhaler)",
"None",
NA_character_,
NA_character_,
NA_character_,
NA_character_
),
Comorbidity_six = c(
NA_character_,
NA_character_,
NA_character_,
NA_character_,
NA_character_,
NA_character_
),
Comorbidity_seven = c(
NA_character_,
NA_character_,
NA_character_,
NA_character_,
NA_character_,
NA_character_
),
Comorbidity_eight = c(
"High Blood Pressure (hypertension)",
NA_character_,
NA_character_,
NA_character_,
NA_character_,
NA_character_
),
Comorbidity_nine = c(
NA_character_,
NA_character_,
NA_character_,
"High Blood Pressure (hypertension)",
NA_character_,
"High Blood Pressure (hypertension)"
)
),
row.names = c(NA,-6L),
class = c("tbl_df",
"tbl", "data.frame")
)

最佳答案

这是一种方法。

首先,会pivot_longer您的合并症,因此每行有一个合并症。然后将删除 NA和重复的合并症。

那么你可以使用 pivot_wider为每个合并症设置列,如果存在则为 1,并使用 values_fill对于不存在的 0 而不是 NA .

library(tidyverse)

test %>%
pivot_longer(cols = starts_with("Comorbidity"), names_to = "Comorbidity_Count", values_to = "Comorbidity") %>%
drop_na(Comorbidity) %>%
select(-Comorbidity_Count) %>%
distinct() %>%
mutate(Condition = 1) %>%
pivot_wider(id_cols = -c(Comorbidity, Condition), names_from = Comorbidity, values_from = Condition, values_fill = list(Condition = 0))

输出
# A tibble: 6 x 19
ID Chills Cough Diarrhoea Fatigue Headcahe `Loss of smell a… `Muscle Ache` `Nasal Congesti… `Nausea and Vom… `Shortness of B… `Sore Throat` Sputum Temperature `Asthma (manage… `Diabetes Type … `High Blood Pre… None Obesity
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 No No No No No No No No No No No No No 1 1 1 0 0
2 2 Mild Severe Mild Moderate No No Moderate No No Mild No Mild No 0 0 0 1 0
3 3 No No No Mild No No No No No No No No No 0 0 1 0 1
4 4 Mild Mild No Mild Mild No Moderate No No No No Mild No 0 0 1 0 1
5 5 No Mild No Mild No No Mild Mild No No Mild Mild No 0 0 0 1 0
6 6 No No No Mild No No Mild No No Mild No No 37.5-38 0 0 1 1 0

关于r - 将一列中的类别字符串提取到一列中,仅使用 tidyverse。在 R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61941331/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com