gpt4 book ai didi

r - 将点平均在一起而不重复并减少最终数据帧

转载 作者:行者123 更新时间:2023-12-05 05:44:33 28 4
gpt4 key购买 nike

目标是对 10 米内的点进行平均,而不重复平均中的任何点,将点数据帧减少为平均点,理想情况下,沿着收集这些点的路线获得平稳的点流。这是一个来自更大文件(25,000 个观察值)的 11 点子集示例数据框:

library(sf)
df <- data.frame(trait = as.numeric(c(91.22,91.22,91.22,91.58,91.47,92.19,92.19,90.57,90.57,91.65,91.65)),
datetime = as.POSIXct(c("2021-08-06 15:08:43","2021-08-06 15:08:44","2021-08-06 15:08:46","2021-08-06 15:08:47","2021-08-06 15:43:17","2021-08-06 15:43:18","2021-08-06 15:43:19","2021-08-06 15:43:20","2021-08-06 15:43:21","2021-08-06 15:43:22","2021-08-06 15:43:23")),
lat = c(39.09253, 39.09262, 39.09281, 39.09291, 39.09248, 39.09255, 39.09261, 39.09266, 39.0927, 39.09273, 39.09274),
lon = c(-94.58463, -94.58462, -94.5846, -94.58459, -94.58464, -94.58464, -94.58464, -94.58464, -94.58466, -94.5847, -94.58476)
) # just to add some value that is plotable
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
df <- st_as_sf(x = df,
coords = c("lon", "lat"),
crs = projcrs)

这是我尝试过的:

  1. st_is_within_distance(trav, trav, tolerance) 的多次迭代,包括:
  2. 一个aggregate method显示在这里。这些方法不起作用,因为相同的点会被多次平均。
  3. 通过尝试在 lapply 中动态更新列表,接近 filteracross 但最终没有成功。
  4. This is helpful来自@jeffreyevans,但并没有真正解决问题,而且有点过时了。
  5. spThin 包不起作用,因为它是为更具体的变量而设计的。
  6. 我想 cluster using this post ,但集群会抛出随机点,实际上并没有有效地减少数据帧。

这是我所得到的最接近的。同样,此解决方案的问题在于它在收集平均值时重复了点,这使得某些点比其他点具有更高的权重。

  # first set tolerance
tolerance <- 20 # 20 meters

# get distance between points
i <- st_is_within_distance(df, df, tolerance)

# filter for indices with more than 1 (self) neighbor
i <- i[which(lengths(i) > 1)]

# filter for unique indices (point 1, 2 / point 2, 1)
i <- i[!duplicated(i)]

# points in `sf` object that have no neighbors within tolerance
no_neighbors <- trav[!(1:nrow(df) %in% unlist(i)), ]

# iterate over indices of neighboring points
avg_points <- lapply(i, function(b){
df <- df[unlist(b), ]
coords <- st_coordinates(df)

df <- df %>%
st_drop_geometry() %>%
cbind(., coords)

df_sum <- df %>%
summarise(
datetime = first(datetime),
trait = mean(trait),
X = mean(X),
Y = mean(Y),
.groups = 'drop') %>%
ungroup()

return(df)
}) %>%

bind_rows() %>%
st_as_sf(coords = c('X', 'Y'),
crs = "+proj=longlat +datum=WGS84 +no_defs ")

最佳答案

另一个答案,使用 sf::aggregate() 和六边形网格来查找彼此之间特定距离内的点。也可以使用正方形网格。结果会有所不同,具体取决于网格相对于点的确切位置,但在确定平均值时不应多次使用任何点。

步骤大纲:

  • 加载数据,转换为 crs 5070 以进行以米为单位的测量
  • 获取数据的边界框
  • 为每个直径约 10 米的边界框制作一个六边形网格
  • 使用 mean 聚合落在同一个六边形中的点
  • 加入原始数据
library(sf)
library(tidyverse)

set.seed(22) # might be needed to get same hex grid?

#### your sample data
df <- data.frame(trait = as.numeric(c(91.22,91.22,91.22,91.58,91.47,92.19,92.19,90.57,90.57,91.65,91.65)),
datetime = as.POSIXct(c("2021-08-06 15:08:43","2021-08-06 15:08:44","2021-08-06 15:08:46","2021-08-06 15:08:47","2021-08-06 15:43:17","2021-08-06 15:43:18","2021-08-06 15:43:19","2021-08-06 15:43:20","2021-08-06 15:43:21","2021-08-06 15:43:22","2021-08-06 15:43:23")),
lat = c(39.09253, 39.09262, 39.09281, 39.09291, 39.09248, 39.09255, 39.09261, 39.09266, 39.0927, 39.09273, 39.09274),
lon = c(-94.58463, -94.58462, -94.5846, -94.58459, -94.58464, -94.58464, -94.58464, -94.58464, -94.58466, -94.5847, -94.58476)
) # just to add some value that is plotable
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
df <- st_as_sf(x = df,
coords = c("lon", "lat"),
crs = projcrs) %>%
st_transform(5070) ### transform to 5070 for a projection in meters
#### end sample data


# Get a bounding box as an sf object to make a grid
bbox <- st_bbox(df) %>% st_as_sfc()

# Make a grid as hexagons with approximately the right size
# area ~86m; side ~5.75m; long diag ~11.5m
hex_grid <- st_make_grid(bbox, cellsize = 10, square = F) %>% st_as_sf()

# Aggregate mean of the hexagonal grid
hex_agg <- aggregate(df ,
hex_grid,
mean,
join = st_contains) %>% filter(!is.na(trait))

# Assign the mean of the hexagon to points that fall
# within each hexagon
df_agg <- st_join(df, hex_agg)

head(df_agg) # trait.x from df, trait.y from the mean by hexagon
#> Simple feature collection with 6 features and 4 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 121281.6 ymin: 1786179 xmax: 121285.3 ymax: 1786227
#> Projected CRS: NAD83 / Conus Albers
#> trait.x datetime.x trait.y datetime.y
#> 1 91.22 2021-08-06 15:08:43 91.70500 2021-08-06 15:26:00
#> 2 91.22 2021-08-06 15:08:44 91.32667 2021-08-06 15:31:47
#> 3 91.22 2021-08-06 15:08:46 91.22000 2021-08-06 15:08:46
#> 4 91.58 2021-08-06 15:08:47 91.58000 2021-08-06 15:08:47
#> 5 91.47 2021-08-06 15:43:17 91.47000 2021-08-06 15:43:17
#> 6 92.19 2021-08-06 15:43:18 91.70500 2021-08-06 15:26:00
#> geometry
#> 1 POINT (121282.5 1786184)
#> 2 POINT (121283.2 1786194)
#> 3 POINT (121284.6 1786216)
#> 4 POINT (121285.3 1786227)
#> 5 POINT (121281.7 1786179)
#> 6 POINT (121281.6 1786186)

sum(df_agg$trait.x) - sum(df_agg$trait.y) # original trait - aggregate trait should be 0, or near 0
#> [1] 0

ggplot(df_agg) +
geom_sf(aes(size = trait.x), alpha = .2, color = 'blue') + # Original triat
geom_sf(aes(size = trait.y), alpha = .2, color = 'red') + # New aggregated trait
theme_void()

按特征大小。蓝点是原始的,红色是新的空间均值。


## Plot of
# original points & hex grid used:
ggplot() +
geom_sf(data = df, color = 'red') +
geom_sf(data = hex_grid, fill = NA) +
theme_void()

绘图显示了均值点的分组。看起来每个六边形有 1、2 和 3 个点的组的平均值。

reprex package 创建于 2022-03-23 (v2.0.1)

编辑

更新为每个六边形只有一个点,丢失了一些原始点

## Edit for one point per hexagon:
hex_agg <- aggregate(df ,
hex_grid,
mean,
join = st_contains) %>% filter(!is.na(trait)) %>%
rownames_to_column('hex_num') # add hexagon number to group_by

## Guide to join on, has only hexagon number & centroid of contained points
hex_guide <- df_agg %>%
group_by(hex_num) %>%
summarise() %>%
st_centroid()

# The full sf object with only one point per hexagon
# this join isn't the most efficient, but slice(1) removes
# the duplicate data. You could clean df_agg before the join
# to resolve this
final_join <- df_agg %>%
st_drop_geometry() %>%
left_join(hex_guide, by = 'hex_num') %>%
group_by(hex_num) %>%
slice(1) %>%
ungroup() %>%
st_as_sf()

ggplot() +
geom_sf(data = final_join, color = 'red', size = 3) +
geom_sf(data = df, color = 'black', alpha = .5) +
geom_sf(data = hex_grid, color = 'blue', fill = NA)

该图显示了六边形、灰色的原始数据点和分组原始点的质心处的新红色点。每个六边形只有 1 个红点。

enter image description here

关于r - 将点平均在一起而不重复并减少最终数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71579664/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com