gpt4 book ai didi

r - 按 `-` 拆分 r 中的月/年字符串

转载 作者:行者123 更新时间:2023-12-02 08:02:30 24 4
gpt4 key购买 nike

我有一列如下;

   fiscal_year_end
1 1231
2 1231
3 1231
4 1231
5 202
6 1231
7 1231
8 202
9 1231
10 927

它们对应于月份,即 12-31 , 9-2720-2 .

我正在尝试将它们放入那种格式,但似乎无法正确处理。

我试过了str_replace_all(df$fiscal_year_end, "(?<=^\\d{2}|^\\d{4})", "-")使用 stringr包,但它没有像我预期的那样出来。

我哪里出错了?

数据:

structure(list(fiscal_year_end = c(1231L, 1231L, 1231L, 1231L, 
202L, 1231L, 1231L, 202L, 1231L, 927L, 228L, 1231L, 1231L, 1231L,
1231L, 928L, 1231L, 1231L, 930L, 1231L, 1231L, 628L, 1231L, 1231L,
1228L, 930L, 1231L, 1231L, 1231L, 1231L, 927L, 630L, 1231L, 202L,
1231L, 1231L, 1231L, 1231L, 927L, 930L, 1231L, 1231L, 1231L,
1231L, 228L, 928L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1228L, 1231L, 1231L, 1231L, 1231L,
131L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 831L, 1231L, 102L,
1231L, 1231L, 1231L, 1130L, 1231L, 1228L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 930L, 1031L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 203L, 1231L, 1231L, 1231L,
1231L, 1231L, 1229L, 1231L, 1231L, 1231L, 426L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 202L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1229L, 1231L, 1231L, 630L,
1231L, 1231L, 1209L, 1231L, 1231L, 1231L, 728L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 630L, 1231L, 1231L, 1231L, 1231L,
1231L, 1231L, 727L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L,
1231L, 630L, 1231L, 1231L, 1231L, 1130L, 1231L, 1231L, 1231L,
1231L, 1231L, 1231L, 1231L, 930L, 930L, 1231L, 1231L, 331L, 1231L,
1231L, 1231L, 1231L, 1231L, 1231L, 1231L, 1031L, 1229L, 1231L,
1231L, 1231L, 201L, 1231L, 1231L, 1231L, 1231L, 1231L, 1231L,
831L, 630L, 831L)), row.names = c(NA, -200L), .internal.selfref = <pointer: 0x0000000002511ef0>, class = "data.frame")

编辑:

     datadate fiscal_year_end
1 2012-08-31 831
2 2017-01-31 201
3 1999-12-31 1231
4 2009-02-28 228
5 2010-12-31 1231
6 2005-12-31 1231
7 <NA> 630
8 2010-09-30 928
9 2009-09-30 930
10 2018-01-31 201
11 2017-12-31 1231
12 2004-12-31 1231

最佳答案

格式化为4位后我们可以分离

library(dplyr)
library(tidyr)
df1 %>%
mutate(fiscal_year_end = sprintf("%04d", fiscal_year_end)) %>%
separate(fiscal_year_end, c("month", "day"), sep= 2)

或者在separate中使用负索引

df1 %>% 
separate(fiscal_year_end, c("month", "day"), sep= -2)

或者使用only base R,我们使用sub 创建一个定界符(仅使用单个捕获组)并将其转换为两个带有 read.csv

的列 data.frame
out <- read.csv(text = sub("(\\d{2})$", ",\\1", df1[[1]]), header = FALSE,
col.names = c("month", "day"), stringsAsFactors = FALSE)

head(out, 5)
# month day
#1 12 31
#2 12 31
#3 12 31
#4 12 31
#5 2 2

关于r - 按 `-` 拆分 r 中的月/年字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55538619/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com