r - 识别在空间和时间上重叠的观察-6ren

r - 识别在空间和时间上重叠的观察

转载作者：行者123 更新时间：2023-12-03 18:20:35

我有一个数据框，每一行都是一个独特的观察。
如果观测位于彼此之间指定的时间距离(例如 30 天)内，则观测在时间上会重叠。
如果观测位于彼此指定的空间距离(例如 20 公里)内，则观测在空间上会重叠。
我正在处理时间和空间重叠的观察集合。我想创建一个列(重叠)，其中包含与观察重叠的观察 ID 的向量。我已经尝试了下面的解决方案，但运行时间太短，解决方案不适用。

library(dplyr)
library(lubridate)
library(purrr)
library(geosphere)

spat_proximity <- function(x, y, z) {
  
  return(which(map_dbl(y, ~ distGeo(., x)) <= z))}


temp_proximity <- function(x, y, z) {
    
  return(which(map_dbl(y, ~ abs(x - .)) <= z))}


test %>%
  mutate(overlaps = map2(map(place, ~ spat_proximity(., place, 20000)),
                         map(time, ~ temp_proximity(., time, 30)),
                         ~ intersect(.x, .y)))

关于如何加快速度的想法将不胜感激。
所需的输出


structure(list(id = 1:42, time = structure(c(1478601762, 1475170279, 
1469770219, 1462441336, 1474739469, 1488216507, 1475203721, 1468705558, 
1481722718, 1485897197, 1488669576, 1501288618, 1510266595, 1516828588, 
1497048175, 1516546144, 1507576242, 1517654363, 1496070298, 1519765220, 
1507408104, 1532046710, 1542196446, 1534747170, 1533605231, 1521381844, 
1545389880, 1537988628, 1544304998, 1524842149, 1551051077, 1540822870, 
1579775599, 1580337175, 1551486497, 1554879837, 1568620434, 1568701543, 
1556387550, 1561253396, 1582925482, 1562166384), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), place = list(c(7.59729413351368, 52.6052275122351
), c(9.99728447956781, 53.43773657253), c(10.1114473929533, 53.1295890148866
), c(7.74115218835801, 53.555354690339), c(9.82895066827581, 
53.1009319396015), c(10.061107415855, 53.1908752763309), c(10.1134381934544, 
53.1450558612239), c(8.59001735546083, 53.1767797285482), c(6.43939168487555, 
52.5520931654252), c(8.38811111096636, 53.9043055557574), c(6.20061916537948, 
52.462037409576), c(8.66656282486832, 52.8269702466929), c(9.92127490588442, 
53.1240045666796), c(9.77810957468704, 53.1445777603789), c(10.0972382106036, 
53.1604265989175), c(10.0473952445094, 53.1698097395641), c(9.23773401919961, 
53.2120381900218), c(8.29524237837988, 52.822332696399), c(6.63690696797941, 
53.4436726627048), c(6.89839325296288, 53.947454203445), c(6.97064542834721, 
54.2487197094445), c(9.98865072631714, 53.4088944299342), c(9.94164401569524, 
53.1500576073959), c(9.64242996587752, 52.9285470044703), c(10.1026940185685, 
53.1635394335485), c(9.94874529044194, 53.2202512735354), c(8.8025526552284, 
53.2423093779114), c(7.93352467761445, 52.9129105531343), c(6.6418846001424, 
53.2459031608081), c(7.56102465003101, 53.5306444680171), c(7.36619114998468, 
53.748869508885), c(7.40993284414052, 54.5367797663042), c(9.90022663895919, 
53.3726361099083), c(9.41110555596208, 52.5001044709056), c(10.1151193231519, 
53.1539029361817), c(10.1064400828529, 53.1793449776572), c(9.94235711256256, 
53.2622041055899), c(9.44215997717822, 53.4799339987572), c(7.03832846889284, 
53.1986115213435), c(7.32755360272354, 53.416700338513), c(7.57828611098173, 
53.6107027769073), c(7.55411005022882, 54.1905803935834)), overlaps = list(
    1L, 2L, 3L, 4L, c(5L, 7L), 6L, c(5L, 7L), 8L, 9L, 10L, 11L, 
    12L, 13L, c(14L, 16L), 15L, c(14L, 16L), 17L, 18L, 19L, 20L, 
    21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 
    33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L)), row.names = c(NA, 
-42L), class = c("tbl_df", "tbl", "data.frame"))

数据

structure(list(id = 1:42, time = structure(c(1478601762, 1475170279, 
1469770219, 1462441336, 1474739469, 1488216507, 1475203721, 1468705558, 
1481722718, 1485897197, 1488669576, 1501288618, 1510266595, 1516828588, 
1497048175, 1516546144, 1507576242, 1517654363, 1496070298, 1519765220, 
1507408104, 1532046710, 1542196446, 1534747170, 1533605231, 1521381844, 
1545389880, 1537988628, 1544304998, 1524842149, 1551051077, 1540822870, 
1579775599, 1580337175, 1551486497, 1554879837, 1568620434, 1568701543, 
1556387550, 1561253396, 1582925482, 1562166384), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), place = list(c(7.59729413351368, 52.6052275122351
), c(9.99728447956781, 53.43773657253), c(10.1114473929533, 53.1295890148866
), c(7.74115218835801, 53.555354690339), c(9.82895066827581, 
53.1009319396015), c(10.061107415855, 53.1908752763309), c(10.1134381934544, 
53.1450558612239), c(8.59001735546083, 53.1767797285482), c(6.43939168487555, 
52.5520931654252), c(8.38811111096636, 53.9043055557574), c(6.20061916537948, 
52.462037409576), c(8.66656282486832, 52.8269702466929), c(9.92127490588442, 
53.1240045666796), c(9.77810957468704, 53.1445777603789), c(10.0972382106036, 
53.1604265989175), c(10.0473952445094, 53.1698097395641), c(9.23773401919961, 
53.2120381900218), c(8.29524237837988, 52.822332696399), c(6.63690696797941, 
53.4436726627048), c(6.89839325296288, 53.947454203445), c(6.97064542834721, 
54.2487197094445), c(9.98865072631714, 53.4088944299342), c(9.94164401569524, 
53.1500576073959), c(9.64242996587752, 52.9285470044703), c(10.1026940185685, 
53.1635394335485), c(9.94874529044194, 53.2202512735354), c(8.8025526552284, 
53.2423093779114), c(7.93352467761445, 52.9129105531343), c(6.6418846001424, 
53.2459031608081), c(7.56102465003101, 53.5306444680171), c(7.36619114998468, 
53.748869508885), c(7.40993284414052, 54.5367797663042), c(9.90022663895919, 
53.3726361099083), c(9.41110555596208, 52.5001044709056), c(10.1151193231519, 
53.1539029361817), c(10.1064400828529, 53.1793449776572), c(9.94235711256256, 
53.2622041055899), c(9.44215997717822, 53.4799339987572), c(7.03832846889284, 
53.1986115213435), c(7.32755360272354, 53.416700338513), c(7.57828611098173, 
53.6107027769073), c(7.55411005022882, 54.1905803935834))), row.names = c(NA, 
-42L), class = c("tbl_df", "tbl", "data.frame"))

最佳答案

如果您真的想要速度，您可以编写自己的 C++ 代码来计算距离 (because geosphere is quite slow)和时间比较

例子
将此代码保存在文件中，例如 "~/Desktop/find_overlaps.cpp"您需要安装 Rcpp - install.packages("Rcpp")



#include "Rcpp.h"
#include <math.h>

static const double earth = 6378137.0; // WSG-84 definition

// haversine formula taken from the geodist library
// - https://github.com/hypertidy/geodist
double distance_haversine(double x1, double y1, double x2, double y2) {

  double cosy1 = cos( y1 * M_PI / 180.0 );
  double cosy2 = cos( y2 * M_PI / 180.0 );

  double sxd = sin ((x2 - x1) * M_PI / 360.0);
  double syd = sin ((y2 - y1) * M_PI / 360.0);
  double d = syd * syd + cosy1 * cosy2 * sxd * sxd;
  d = 2.0 * earth * asin (sqrt (d));
  return (d);
}

// returns true if second date is within 30 days of the first
bool within_days( int first_date, int second_date ) {
  int days = 30 * 24 * 60 * 60;
  int lower_bound = first_date - days;
  int upper_bound = first_date + days;

  return lower_bound <= second_date && second_date <= upper_bound;
}

bool within_distance( Rcpp::NumericVector start_place, Rcpp::NumericVector end_place, double distance_limit = 20000.0 ) {

  double x1 = start_place[0];
  double y1 = start_place[1];
  double x2 = end_place[0];
  double y2 = end_place[1];

  return distance_haversine(x1, y1, x2, y2) <= distance_limit;
}

// [[Rcpp::export]]
SEXP find_overlaps( Rcpp::NumericVector ids, Rcpp::IntegerVector dates, Rcpp::List place ) {

  R_xlen_t n = dates.length();
  R_xlen_t i, j;

  Rcpp::List res( n );

  R_xlen_t result_counter;
  for( i = 0; i < n; ++i ) {

    Rcpp::IntegerVector overlaps( n ); // initialise vector to store results
    result_counter = 0;

    for( j = 0; j < n; ++j ) {
      // ignore self-comparisons
      if( i != j ) {

        int first_date = dates[ i ];
        int second_date = dates[ j ];

        Rcpp::NumericVector first_place = place[ i ];
        Rcpp::NumericVector second_place = place[ j ];

        // check the place values exist
        if( first_place.length() != 2 || second_place.length() != 2 ) {
          continue;
        }

        if( within_days( first_date, second_date) && within_distance( first_place, second_place ) ) {
          overlaps[ result_counter ] = j;
          result_counter++;
        }
      }

      if( result_counter > 0 ) {
        Rcpp::IntegerVector id_idx = overlaps[ Rcpp::Range( 0, result_counter - 1 ) ];
        res[ i ] = ids[ id_idx ];
      }

    }
  }
  return res;
}

然后在 R 中获取它

library(Rcpp)

Rcpp::sourceCpp(file = "~/Desktop/find_overlaps.cpp")

res <- find_overlaps( df$id, df$time, df$place )

df$overlaps <- res

df
#    id                time               place overlaps
# 1   1 2016-11-08 10:42:42 7.597294, 52.605228     NULL
# 2   2 2016-09-29 17:31:19 9.997284, 53.437737     NULL
# 3   3 2016-07-29 05:30:19  10.11145, 53.12959     NULL
# 4   4 2016-05-05 09:42:16 7.741152, 53.555355     NULL
# 5   5 2016-09-24 17:51:09 9.828951, 53.100932        7
# 6   6 2017-02-27 17:28:27  10.06111, 53.19088     NULL
# 7   7 2016-09-30 02:48:41  10.11344, 53.14506        5
# 8   8 2016-07-16 21:45:58 8.590017, 53.176780     NULL
# 9   9 2016-12-14 13:38:38 6.439392, 52.552093     NULL
# 10 10 2017-01-31 21:13:17 8.388111, 53.904306     NULL
# 11 11 2017-03-04 23:19:36 6.200619, 52.462037     NULL
# 12 12 2017-07-29 00:36:58 8.666563, 52.826970     NULL
# 13 13 2017-11-09 22:29:55 9.921275, 53.124005     NULL
# 14 14 2018-01-24 21:16:28   9.77811, 53.14458       16
# 15 15 2017-06-09 22:42:55  10.09724, 53.16043     NULL
# 16 16 2018-01-21 14:49:04  10.04740, 53.16981       14
# 17 17 2017-10-09 19:10:42 9.237734, 53.212038     NULL
# 18 18 2018-02-03 10:39:23 8.295242, 52.822333     NULL
# 19 19 2017-05-29 15:04:58 6.636907, 53.443673     NULL
# 20 20 2018-02-27 21:00:20 6.898393, 53.947454     NULL
# ...

笔记

我运行了一个快速基准测试，它在你的示例上运行了几微秒

我故意忽略了自我重叠(因此是 NULL 值)

关于r - 识别在空间和时间上重叠的观察，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65715493/

文章推荐： ruby-on-rails - 西类牙语的复数化和单数化

文章推荐： Java:列表未正确通过套接字发送？

文章推荐： r - 在大型数据库中的 R 中创建列

文章推荐： ruby-on-rails - rails : Can I run backgrounds jobs in a different server?

Python通过特定方程列出交叉识别(重叠？)
我对具有 2 个轴的数据有交叉识别问题，例如 A = array([['x0', 'y0', 'data0', 'data0'], ['x0', 'y0', 'data0', '
Haskell 重叠/不连贯的实例
我知道这是代码有点傻，但有人可以解释为什么 isList [42]返回 True而isList2 [42]打印 False ，以及如何防止这种情况？我想更好地理解一些更晦涩的 GHC 类型扩展，我认为
c - Memmove 重叠
我正在使用memmove()，但目标似乎正在覆盖源，或者也许我不明白覆盖是什么。我有一个 char 数组(目标)，然后是一个指向目标的指针，该指针位于 vector 内部。 char destinat
flash - Flash中的流音频播放多次，重叠
以下AS3代码有时会导致音频多次播放，就像疯狂的回声一样，几乎同时播放。通常使用该URL都可以，但是当我使用https://soundcloud.com url时，它总是会发疯。在极少数情况下，我认为
java - 线性布局不可见/重叠
我正在尝试在 android 2.2 中实现类似操作栏的东西。这是我的 main.xml
ios图表框架值(value)重叠
如何避免第一个值的重叠问题而且，我怎样才能看到最后一个被剪裁的值？最佳答案我认为您在修改轴上的样式和调整视口(viewport)之间有几种选择。我会尝试: 禁用左轴，启用右轴 chart.le
ios - UIScrollView 重叠
我正在构建一个简单的应用程序，您可以在其中使用纸娃娃之类的工具来描述您的外观。 Check out this image.计划是有 4 个水平 ScrollView :第一个用于发型，第二个用于面部毛
android - 重叠 ScrollView
我有一个问题...我在绝对布局中有两个 ScrollView 。换句话说，它们是全屏的并且相互重叠上面的scrollview是水平滚动的，下面的是垂直滚动的scrollview。当我水平滚动时，我
几个旋转屏幕后Android fragment 重叠
我看了一些类似的问题，但我不太明白在我的层次结构中我应该做什么？我有用于屏幕底部的标签菜单和对于其他将创建的 fragment 。我有 9 个标签菜单，每个都是 fragment 。一
Android fragment 重叠
在我的 Android 应用程序中，我有一个编辑文本和一个按钮，单击该按钮会向我的主要 Activity 添加一个 fragment ，其中包含在我的编辑文本中写入的消息。问题是，当我更改消息并单击按
ios - 分段控件的标题不适合，重叠
在我的分段控件中，有时标题比其段宽。我怎样才能让它截断？假设第 1 段的标题是 Text overlaps，第 2 段的名称是 ok。我希望它看起来如何: [Text ov...| ok
iphone - UITableViewCell 重叠
我想创建一个带有重叠单元格的 uitableview，如下图所示。问题是，即使我为单元格的内容 View 设置 clipsToBounds = NO，单元格假标题(例如，将与前一个单元格重叠的西类牙语
CSS 重叠 div
有了这个CSS .addProblemClass{ width:300px; height:300px; /*width:25%; height:40%;*/
javascript - 离开窗口选项卡时图像堆叠(重叠)
我有跨窗口移动的图像(2 行)，当我离开页面选项卡时，然后返回它，所有图像都相互堆叠。 JS代码(记入jfriend00) function startMoving(img) { va
javascript - SetTimeout 重叠？
这是我的一段代码。图像在 23 毫秒后正常可见，但永远不会像第二行所示那样返回隐藏状态。如果我将其从 17 毫秒更改为大于 23 毫秒的值，它就会起作用。反之亦然，如果我将第一行更改为 16 毫秒，它
javascript - javascript中for循环中的碰撞/重叠
我正在可汗学院为学校项目编写一款太空入侵者游戏，但我不知道如何在子弹和外星人之间进行碰撞，然后摆脱子弹所碰撞的外星人。这是非常基本的 JS，尽管我尝试过，但我不太明白如何将有关该主题的其他答案放入我的
iOS UITableViewCell 重叠
当我尝试重新加载 tableView 的数据时出现奇怪的重叠，导致单元格的高度发生变化(使用 UITableViewAutomaticDimension)，然后内容与上面的单元格重叠，无法弄清楚怎么做
html - 标题和部分相互合并/重叠
我是一个新手，如果这是一个愚蠢的问题，请原谅我。我想有一个部分与标题分开，但发生了两种情况: (1) 当我把在下面，它们相互重叠，如下所示: Section overlapping header
css - Div 重叠
我正在尝试创建两个那是重叠的。唯一的问题是第二个在第一个的前面它必须是相反的。我尝试设置第一个的 z-index至 1但它仍然不起作用。这是我的代码: #content{ backgrou
CSS - 重叠 - 有效
是否有重叠 2 个 div 的有效方法。我有以下内容，但无法让它们重叠。 #top-border{width:100%; height:60px; background:url(image.jpg)

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 识别在空间和时间上重叠的观察