gpt4 book ai didi

r - 邮政编码距离 R

转载 作者:行者123 更新时间:2023-12-02 06:48:49 26 4
gpt4 key购买 nike

我正在 R 中使用邮政编码包,我想列出每个邮政编码 10、20 或 X 英里半径范围内的所有邮政编码。从那里我会将邮政编码数据汇总到 10、20 或 X 英里总数。我目前正在将每个邮政编码与每个邮政编码连接起来(因此行数为平方)。然后计算每个邮政编码之间的距离。然后消除大于 10,20, X 英里的距离。在 R 中是否有更好的方法来做到这一点,这样我就不必计算所有可能性?我是 R 新手。谢谢!

Code is here:
#Bringing in Zipcode database.
library(zipcode)
data(zipcode)

#Limiting to certain states that I want to include,
SEZips <- zipcode[zipcode$state %in% c("GA","AL", "SC", "NC"),]

#Duplicating the data set to join it together
SEZips2 <- SEZips

#To code in SQL
library(sqldf)

#Creating a common match so I can join all rows from both tables together
SEZips$Match <- 1
SEZips2$Match <- 1

#attaches every zip code to each zip
ZipList <- sqldf("
SELECT
A.zip as zip1,
A.longitude as lon1,
A.latitude as lat1,
B.zip as zip2,
B.longitude as lon2,
B.latitude as lat2
From SEZips A
Left Join SEZips2 B
on A.Match = B.Match
")


#to get the distance calculation, use package geosphere,
library(geosphere)

#radius of Earth in miles, adjust for km, etc.
r = 3959
#Creating Table of the coordinates. Makes it easy to calc distance
Points1 <- cbind(ZipList$lon1,ZipList$lat1)
Points2 <- cbind(ZipList$lon2,ZipList$lat2)
distance <- distHaversine(Points1,Points2,r)

#Adding distance back on to the original ZipList
ZipList$Distance <- distance

#To limit to a certain radius.E.g. 15 for 15 miles.
z = 15
#Eliminating matches > z
ZipList2 <- ZipList[ZipList$Distance <= z,]

#Adding data to roll up, e.g. population
ZipPayroll <- read.csv("filepath/ZipPayroll.csv")

#Changin Zip to 5 character from integer. A little bit of pain
#Essentailly code says (add 5 0's, and then grab the right 5 characters)
ZipPayroll$Zip2 <- substr(paste("00000",ZipPayroll$zip,sep=""),nchar(paste("00000",ZipPayroll$zip,sep=""))-4,nchar(paste("00000",ZipPayroll$zip,sep="")))

#Joining Payroll info to SEZips dataframe
SEZips <- sqldf("
SELECT
A.*,
B.Payroll,
B.Employees,
B.Establishments
From SEZips A
Left Join ZipPayroll B
on A.zip = B.Zip2
")

#Rolling up to 15 mile level
SEZips15 <- sqldf("
SELECT
A.zip1 as Zip,
Sum(B.Payroll) as PayrollArea,
Sum(B.Employees) as EmployeesArea,
Sum(B.Establishments) as EstablishmentsArea
From ZipList2 A
Left Join SEZips B
on A.zip2 = B.zip
Group By A.zip1
")

#Include the oringinal Zip data
SEZips15 <- sqldf("
SELECT
A.*,
B.Payroll,
B.Employees,
B.Establishments as EstablishmentsArea
From SEZips15 A
Left Join SEZips B
on A.zip = B.zip
")

#Calculate Average Pay for Zip and Area
SEZips15$AvgPayArea <- SEZips15$PayrollArea / SEZips15$EmployeesArea
SEZips15$AvgPay <- SEZips15$Payroll / SEZips15$Employees

最佳答案

我在下面添加了一个使用空间风险包的解决方案。该包中的关键函数是用 C++ (Rcpp) 编写的,因此速度非常快。

函数spatialrisk::points_in_circle()计算以中心点为中心的半径内的观测值。请注意,距离是使用半正弦公式计算的。

library(spatialrisk)
library(tidyverse)

zips_within_radius <- function(x,y,z) {
points_in_circle(SEZips, x, y, lon = longitude, lat = latitude, radius = 10000) %>%
mutate(source_zip = z)
}

由于输出的每个元素都是一个数据帧,因此 purrr::map_dfr 用于将它们行绑定(bind)在一起:

pmap_dfr(list(SEZips$longitude, SEZips$latitude, SEZips$zip), zips_within_radius)

关于r - 邮政编码距离 R,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31429275/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com