gpt4 book ai didi

r - 如何将地址(格式不均匀)拆分为R中的各个字段

转载 作者:行者123 更新时间:2023-12-02 21:17:16 25 4
gpt4 key购买 nike

我想将这些地址分成各自的类别(街道号码、街道名称、城市、州和邮政编码)以最终检查哪些是相同的。任何人都可以帮助了解如何在 R 中执行此操作的基本想法吗?

    Company                          Address

1. A 1 NE 1 Street Miami,FL 33132
2. B 1 1st Street Miami,FL 33132
3. C 1 NE 1st St Miami,FL 33132
4. D 1 1st Street Miami,FL 33134
5. E 100 Biscayne Blvd. Miami,FL 33132
6. F 100 Biscayne Blvd Miami ,FL 33132
7. G 100 Biscayne Boulevard Suite 604 Miami,FL 33132
8. H 100 Biscayne Blvd. Suite 604 Miami,FL 33132
9. I 100 N. Biscayne Blvd. Miami,FL 33132

最佳答案

尝试 gsubfn 包中的 read.pattern 。如果行位于文件中,则将 text = Lines 替换为给出文件名的字符串。这可能相当脆弱,一旦您有更多数据可供尝试,您可能需要稍微调整正则表达式。

Lines <- "Company                          Address
1. A 1 NE 1 Street Miami,FL 33132
2. B 1 1st Street Miami,FL 33132
3. C 1 NE 1st St Miami,FL 33132
4. D 1 1st Street Miami,FL 33134
5. E 100 Biscayne Blvd. Miami,FL 33132
6. F 100 Biscayne Blvd Miami ,FL 33132
7. G 100 Biscayne Boulevard Suite 604 Miami,FL 33132
8. H 100 Biscayne Blvd. Suite 604 Miami,FL 33132
9. I 100 N. Biscayne Blvd. Miami,FL 33132"

library(gsubfn)
DF <- read.pattern(text = Lines,
pattern = "\\S+ \\S+ *(\\d+) (.*) (\\S+) ?,(\\S+) (\\d+)$",
skip = 1,
as.is = TRUE,
col.names = c("No", "Street", "City", "State", "Zip"))

给予:

> DF
No Street City State Zip
1 1 NE 1 Street Miami FL 33132
2 1 1st Street Miami FL 33132
3 1 NE 1st St Miami FL 33132
4 1 1st Street Miami FL 33134
5 100 Biscayne Blvd. Miami FL 33132
6 100 Biscayne Blvd Miami FL 33132
7 100 Biscayne Boulevard Suite 604 Miami FL 33132
8 100 Biscayne Blvd. Suite 604 Miami FL 33132
9 100 N. Biscayne Blvd. Miami FL 33132

这是可视化的正则表达式:

\S+ \S+ *(\d+) (.*) (\S+) ?,(\S+) (\d+)$

Regular expression visualization

Debuggex Demo

关于r - 如何将地址(格式不均匀)拆分为R中的各个字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29907802/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com