gpt4 book ai didi

r - 如何根据条件将列分成两列

转载 作者:行者123 更新时间:2023-12-04 09:21:38 25 4
gpt4 key购买 nike

我有一个包含珊瑚测量数据的数据集。除了每次测量,还收集了额外的元数据,包括菌落在实验模块上的位置或“位置”。我试图将数据框中的 Location 列分为水平和垂直组件。每个位置代码都是一个字母数字条目,其中字母代表列 (A-D),数字部分代表行 (1-4)。

在许多情况下,珊瑚位于下一行(例如 A1_2)或下一列(例如 A_B1)的边缘,因此条目的格式从字母和数字变为一个字母和两个数字或两个字母和一个数字。

d <- structure(list(`Module #` = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("111", "112", "113", "114", "115",
"116", "211", "212", "213", "214", "215", "216"), class = "factor"),
Side = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("N", "S", "T"), class = "factor"), TimeStep = c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), Location = c("A1", "A1_2",
"A2", "A3", "A3_4", "A4", "B_C3", "B1", "B1_2", "B2"), Date = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), Year = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("17", "18"
), class = "factor"), Site = structure(c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_), .Label = c("HAN",
"WAI"), class = "factor"), Treatment = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), recruits = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Site_long = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Hanauma Bay",
"Waikiki"), class = "factor"), Shelter = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("High", "Low"
), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

head(d)

我想最终得到一个包含 2 个新列的数据框:一列名为“Column”,一列名为“Row”。 “列”是指位置代码的字母部分,“行”是指编号部分。请注意,每个列值应为 1 个或 3 个字符(例如,A1_2 的 Column = A 或 A_B1 的 Column = A_B)。

最佳答案

我们可以使用str_extract 单独提取值

library(tidyverse)
d %>%
mutate(Column = str_extract(Location, "[A-Z]_?[A-Z]?"),
Row = str_extract(Location, "[0-9]_?[0-9]?")) %>%
select(Location, Column, Row)

# Location Column Row
#1 A1 A 1
#2 A1_2 A 1_2
#3 A2 A 2
#4 A3 A 3
#5 A3_4 A 3_4
#6 A4 A 4
#7 B_C3 B_C 3
#8 B1 B 1
#9 B1_2 B 1_2
#10 B2 B 2

或者使用 tidyr::extract 在一个正则表达式中将列分隔在一起

d %>%
extract(Location, into = c("Column", "Row"),
regex = "([A-Z]_?[A-Z]?)([0-9]_?[0-9]?)")

我们可以使用 base R sub 来使用类似的正则表达式提取值

d$Column <- sub("([A-Z]_?[A-Z]?).*", "\\1", d$Location)
d$Row <- sub("[A-Z]_?[A-Z]?([0-9]_?[0-9]?)", "\\1", d$Location)

关于r - 如何根据条件将列分成两列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57015540/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com