r - 我正在尝试使用 stringr，特别是正则表达式，来分割 "MA: Bristol County (25005)"-6ren

r - 我正在尝试使用 stringr，特别是正则表达式，来分割 "MA: Bristol County (25005)"

转载作者：行者123 更新时间：2023-12-02 19:06:31

24

4

我正在尝试获取一个变量列并将其分成几列。这些值遵循基本模式，县名称具有多种长度和格式。

State-county :
[1] "MA: Bristol County (25005)"
[2] "LA: St. Tammany Parish (22103)"
[3] "CA: Ventura County (06111)"    
[4] "CA: San Mateo County (06081)"

我需要一个州、县名称和县代码列，我可以将其添加回 data.frame 中。一直试图弄清楚如何使用 str_extract 来完成任务。理想情况下，这就是我最终的结局，但我会寻求任何可以获得的帮助。

  state:    county:            county code: 
[1] "MA"   Bristol County       25005
[2] "LA"   St. Tammany Parish   22103
[3] "CA"   Ventura County       06111    
[4] "CA:   San Mateo County     06081

我能够使用我找到的代码 str_extract_all( "(?<=\$).+?(?=\$)")对于县代码(感谢 Nettle )，我能到达的最接近的州 abrev 是 'str_extract_all( h,"..:")它很接近，但包含“:”还尝试过:str_extract_all( "(?<=\\:")

抱歉，如果这不是最好的格式，我试图以我所见过的风格表达得非常清晰。

最佳答案

使用str_match_all:

str_match_all(df$State_county, "([A-Z]+): ([^()]+) \\((\\d+)\\)")

as_tibble(df) %>%
 mutate(matches=str_match_all(State_county, "([A-Z]+): ([^()]+) \\((\\d+)\\)")) %>%
  unnest_wider(matches) %>%
   select(-2) %>%
    set_names("State_county", "State", "County", "ZIP")
# A tibble: 4 x 4
  State_county                   State County             ZIP  
  <fct>                          <chr> <chr>              <chr>
1 MA: Bristol County (25005)     MA    Bristol County     25005
2 LA: St. Tammany Parish (22103) LA    St. Tammany Parish 22103
3 CA: Ventura County (06111)     CA    Ventura County     06111
4 CA: San Mateo County (06081)   CA    San Mateo County   06081

### OR with str_match as we're only using a single pattern
## this saves us from the warning caused by unnest_wider
as_tibble(df)  %>%
 mutate(matches=str_match(State_county, "([A-Z]+): ([^()]+) \\((\\d+)\\)"), State=matches[,2], County=matches[,3], ZIP=matches[,4], matches=NULL)
# A tibble: 4 x 4
  State_county                   State County             ZIP  
  <fct>                          <chr> <chr>              <chr>
1 MA: Bristol County (25005)     MA    Bristol County     25005
2 LA: St. Tammany Parish (22103) LA    St. Tammany Parish 22103
3 CA: Ventura County (06111)     CA    Ventura County     06111
4 CA: San Mateo County (06081)   CA    San Mateo County   06081 
### Another way 
str_match(df$State_county, "([A-Z]+): ([^()]+) \\((\\d+)\\)") %>%
 as.data.frame %>% set_names("State_county", "State", "County", "County_code")
                    State_county State             County County_code
1     MA: Bristol County (25005)    MA     Bristol County       25005
2 LA: St. Tammany Parish (22103)    LA St. Tammany Parish       22103
3     CA: Ventura County (06111)    CA     Ventura County       06111
4   CA: San Mateo County (06081)    CA   San Mateo County       06081

说明:

str_match 基本上将返回捕获的组(用非转义括号编写的子模式([A-Z]+))以及与完整模式匹配的完整字符串

[A-Z]+:匹配状态缩写。
[^()]+ :匹配非左括号的任何内容。县。
\$(\\d+)\$ :匹配左括号 \\( 并在使用分组提取数字时关闭一个括号。县代码.

str_match(df$State_county, "([A-Z]+): ([^()]+) \\((\\d+)\\)")
     [,1]                             [,2] [,3]                 [,4]   
[1,] "MA: Bristol County (25005)"     "MA" "Bristol County"     "25005"
[2,] "LA: St. Tammany Parish (22103)" "LA" "St. Tammany Parish" "22103"
[3,] "CA: Ventura County (06111)"     "CA" "Ventura County"     "06111"
[4,] "CA: San Mateo County (06081)"   "CA" "San Mateo County"   "06081"

关于r - 我正在尝试使用 stringr，特别是正则表达式，来分割 "MA: Bristol County (25005)"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64979357/

24

4

0

文章推荐： prolog - 有没有办法检查序言中的元素是否为 bool 值？

文章推荐： r - 将多列放样到两列并删除 R 中的重复项

文章推荐： java - 使用哪种设计模式来避免验证类中的 if/else？

文章推荐： Python正则表达式选择所有不匹配模式的元素

stringr 从字符串中提取完整数字
来自 ?stringr::str_extract ，我试过 library(stringr) str_extract("number 123", "\\d") # [1] "1" 但我想要完整
R - stringr 每两个空格添加换行符
给定的 str1 <- "0 1 1 2 2 3 3 4 0 4" 我想要: str2 <- "0 1\n1 2\n2 3\n3 4\n0 4" 用stringr做到这一点的方法是什么？最佳答案我
r - stringr:提取包含特定单词的单词
考虑这个简单的例子 dataframe dataframe # A tibble: 2 x 1 text 1
r - Stringr - 删除连字符两侧的前导零
对于每个标题，我正在寻找一种方法来删除连字符两侧的前导零。 codes % str_replace("^0+(?!$)", "") %>% str_c(collapse="-") 并且很好奇是
R、stringr - 使用单个命令替换向量所有元素中的多个字符
首先，我对 R 和一般编程都很陌生，所以如果这是一个愚蠢的问题，我深表歉意。我有一个与此类似的字符向量: > vec vec [1] "XabcYdef" "XabcYdef" "XabcYdef
用 stringr 删除最后一个空格后的所有内容
This question already has answers here: Regular expression in R to remove the part of a string after
r - stringr:通过字符向量提供模式来替换字符串
这是我的数据: df % mutate(X = A %>% str_replace_all(c(value1 = "m", value2 = "n"))) 我想要的输出是: df %>% mutate
r - Stringr - 删除连字符两侧的前导零
对于每个标题，我正在寻找一种方法来删除连字符两侧的前导零。 codes % str_replace("^0+(?!$)", "") %>% str_c(collapse="-") 并且很好奇是
r - 无法重新安装或运行 Stringr 包
我最近将 ubuntu 从 18.04 升级到 20.04，所以这可能是相关的...... 我的代码要求我拥有 stringr 包并使用 tidyverse。当我尝试运行 require(string
r - stringr str_extract 捕获组捕获所有内容
我正在寻找从字符串中提取年份。这总是在“X”之后和“。”之前。然后是一串其他字符。使用 stringr的 str_extract我正在尝试以下操作: year = str_extract(strin
r - 用于检测大写单词的 Stringr 模式
我正在尝试编写一个函数来检测全部大写的大写单词目前，代码: df % add_row(title= "THIS is an EXAMPLE where I DONT get the
r - Stringr str_replace_all 遗漏重复项
我在使用 stringr::str_replace_all 函数时遇到问题。我正在尝试将 iv 的所有实例替换为 insuredvehicle，但该函数似乎只包含第一个术语。 temp_data <-
r - 有没有办法选择性地应用这个 stringr 函数？
我有一个 dataframe的用户，其中一列包含他们 self 报告的位置。因此，报告的某些位置是无意义的，但在将此列与已知位置的其他列匹配时可能会导致误报。下面是数据框的示例。 data <- da
r - 为什么 stringr 在操作字符串时要改变编码？
stringr有这种奇怪的行为，这真的很烦人。 stringr 会在没有警告的情况下更改某些包含外来字符的字符串的编码，在我的例子中是 ø、å、æ、é 和其他一些...如果您 str_trim 一个向
r - stringr 函数连接由逗号分隔的单词向量，最后一个单词之前为 "and"
我知道我可以很容易地写一个，但是有谁知道 stringr(或 stringi)是否已经有一个函数连接一个或多个单词的向量，这些单词用逗号分隔，但在最后一个单词前有一个“and”？最佳答案您可以使用
java - 如何创建自己的资源类型注释(如 @StringRes)？
Android 提供 support annotations ，而且我对 @StringRes 系列注释特别感兴趣，这些注释用于标记整数参数，迫使您传递有效资源而不是任何随机值。使用 Android
regex - 在 stringr 中拆分一个大写跟随小写的字符串
我有一个看起来像这样的字符串向量，我想将其拆分: str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque S
r - stringr 不会安装在 R 中
我在安装 stringr 时遇到问题。这是我请求安装 stringr 时得到的结果: utils:::menuInstallPkgs() also installing the dependency
在 R 中使用 stringr 的带有非捕获组的正则表达式
我正在尝试将非捕获组与 str_extract 一起使用来自 stringr 的函数包裹。下面是一个例子: library(stringr) txt <- "foo" str_extract(txt,
regex - stringr 包中的 Perl 正则表达式
perl()函数在最新版本的 stringr 中被弃用，取而代之的是 regex() .但是，我似乎无法复制早期的行为。要大写字符串向量的第一个字母，这曾经有效: name <- c("jim",

首页

博学

6Ren·AI

商城

r - 我正在尝试使用 stringr，特别是正则表达式，来分割 "MA: Bristol County (25005)"

说明: