python - 有加速暴力 'tally' 算法的替代方法吗？-6ren

python - 有加速暴力 'tally' 算法的替代方法吗？

转载作者：行者123 更新时间：2023-11-30 09:30:43

25

4

如果这是发布此问题的错误位置，请提前道歉。如果有更好的堆栈交换站点，请告诉我。

因此，目前正在开发一种犯罪预测算法，该算法本质上是在城市上铺设一个网格，并预测每个网格条目在未来 30 天内是否会成为热点(至少发生一起袭击犯罪)。

我当前使用的是纳什维尔市，其网格覆盖有 3446 个网格。我有一个网格数据集，其中包含显示网格所需的所有数据、每个网格的 map 坐标以及其周围的相邻网格(底部的邻居、右侧的邻居等)

以下是预测的示例:

在本例中，绿色表示正确的预测。红色表示假阴性，紫色表示机器学习算法的假阳性。

为了训练我的神经网络，我使用了如下所示的功能集:

这里的 Hotspot 是目标值(1 和 0 之一)。周、月、年是从去年提取的犯罪事件的犯罪统计(犯罪发生在上周、上个月和去年)。我的问题是创建这些功能集需要大量时间(脚本需要 6 个小时以上)

#Loop through each grid in the dataset
for grid_index, grid_row in grid.iterrows():
    print("On grid number: ", grid_row['id'])
    near=0
    #Loop through all of the crimes 
    for crime_index, crime_row in crime.iterrows():

        #Parse out the month, day, and year
        date = crime_row['Incident Occurred']
        date_pars = date.split('/')
        month = int(date_pars[0])
        day= int(date_pars[1])
        year =int(date_pars[2].split(' ')[0])

        if grid_row['top '] == crime_row['grid']:
            near +=1
        if grid_row['bottom '] == crime_row['grid']:
            near +=1
        if grid_row['left '] == crime_row['grid']:
            near +=1
        if grid_row['right '] == crime_row['grid']:
            near +=1
        if grid_row['topleft'] == crime_row['grid']:
            near +=1
        if grid_row['topright'] == crime_row['grid']:
            near +=1
        if grid_row['bottomright'] == crime_row['grid']:
            near +=1
        if grid_row['bottomleft'] == crime_row['grid']:
            near +=1

        if month == 12 and grid_row['id'] == crime_row['grid']:
            countMonth = countMonth+1
        if day >= 25 and month == 12 and grid_row['id'] == crime_row['grid']:
            countWeek = countWeek + 1

        if  year == 2017 and grid_row['id'] == crime_row['grid']:
            countYear=countYear+1

    #Update the output for the specific grid
    output = output.append({'Grid': grid_row['id'], 'Hotspot': 0, 'week': countWeek, 'month': 
    countMonth, 'year': countYear,'near': near}, ignore_index=True)
    countMonth = 0
    countYear = 0
    countWeek = 0

现在，这段代码循环遍历每个网格(总共 3446 个)，并在每个网格内循环遍历每个犯罪(大约 18,000 个)，计算计数并将其附加到 pandas 数据帧中...3446*18000 约为 6200 万次计算创建此数据集。我觉得这不会花太长时间，但比理想情况要花更长的时间。

关于如何有效加快速度有什么想法吗？我需要在过去三年的每个月运行这个算法，所以每次运行 36 次超过 5 小时对于我的时间限制来说太长了。

提前感谢您提供任何见解。

编辑:为了澄清“grid_row”是网格 CSV 文件中的每条记录，我在上面的列中发布了该文件(每个网格和相邻网格的位置)，“crime_row”是去年发生的每个犯罪事件:

最佳答案

你做事的方式可以简化为

forall grid
  forall crimes
    if crime.cell == grid.cell
      do something

复杂度为O(|grid| * |crimes|)

如果你有 3k 犯罪和 5k 网格，则需要 15e6 次迭代

更好的方法是迭代犯罪并将其中任何犯罪推送到其关联的网格，将具有相同 grid_index 的所有犯罪堆叠到...相同的位置

gridIdxToCrimes = {} // to a grid_index you associate all the crimes

for crime_row in crime.iterrows():
  grid_index = crime_row['grid']
  if grid_index not in gridIdxToCrimes:
    gridIdxToCrimes[grid_index] = []
  gridIdxToCrimes[grid_index].push(crime_row)

forall grid_index, grid_row in grid.iterrows():
  topIndex = grid_row['top ']
  if topIndex in gridIdxToCrimes:
    # you get all the crimes above your current grid
    near += count(gridIdxToCrimes[topIndex])

这样你就完成了 O(|crimes|+|grid|) = 5k 次迭代

关于python - 有加速暴力 'tally' 算法的替代方法吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59170206/

25

4

0

文章推荐： java - 在同一页面上处理无请求丢失的文件下载链接

文章推荐： java - Tomcat 7.0.32 无法部署我的项目

文章推荐： java - 文件 I/O - Java

文章推荐： java - Eclipse 插件功能？

Ruby 方法() 方法
我想了解 Ruby 方法 methods() 是如何工作的。我尝试使用“ruby 方法”在 Google 上搜索，但这不是我需要的。我也看过 ruby-doc.org，但我没有找到这种方法。
VBS教程：方法-Test 方法
Test 方法对指定的字符串执行一个正则表达式搜索，并返回一个 Boolean 值指示是否找到匹配的模式。 object.Test(string) 参数 object 必选项。总是一个
VBS教程：方法-Replace 方法
Replace 方法替换在正则表达式查找中找到的文本。 object.Replace(string1, string2) 参数 object 必选项。总是一个 RegExp 对象的名称。
VBS教程：方法-Raise 方法
Raise 方法生成运行时错误 object.Raise(number, source, description, helpfile, helpcontext) 参数 object 应为
VBS教程：方法-Execute 方法
Execute 方法对指定的字符串执行正则表达式搜索。 object.Execute(string) 参数 object 必选项。总是一个 RegExp 对象的名称。 string
VBS教程：方法-Clear 方法
Clear 方法清除 Err 对象的所有属性设置。 object.Clear object 应为 Err 对象的名称。说明在错误处理后，使用 Clear 显式地清除 Err 对象。此
VBS教程：方法-CopyFile 方法
CopyFile 方法将一个或多个文件从某位置复制到另一位置。 object.CopyFile source, destination[, overwrite] 参数 object 必选
VBS教程：方法-Copy 方法
Copy 方法将指定的文件或文件夹从某位置复制到另一位置。 object.Copy destination[, overwrite] 参数 object 必选项。应为 File 或 F
VBS教程：方法-Close 方法
Close 方法关闭打开的 TextStream 文件。 object.Close object 应为 TextStream 对象的名称。说明下面例子举例说明如何使用 Close 方
VBS教程：方法-BuildPath 方法
BuildPath 方法向现有路径后添加名称。 object.BuildPath(path, name) 参数 object 必选项。应为 FileSystemObject 对象的名称
VBS教程：方法-GetFolder 方法
GetFolder 方法返回与指定的路径中某文件夹相应的 Folder 对象。 object.GetFolder(folderspec) 参数 object 必选项。应为 FileSy
VBS教程：方法-GetFileName 方法
GetFileName 方法返回指定路径（不是指定驱动器路径部分）的最后一个文件或文件夹。 object.GetFileName(pathspec) 参数 object 必选项。应为
VBS教程：方法-GetFile 方法
GetFile 方法返回与指定路径中某文件相应的 File 对象。 object.GetFile(filespec) 参数 object 必选项。应为 FileSystemObject
VBS教程：方法-GetExtensionName 方法
GetExtensionName 方法返回字符串，该字符串包含路径最后一个组成部分的扩展名。 object.GetExtensionName(path) 参数 object 必选项。应
VBS教程：方法-GetDriveName 方法
GetDriveName 方法返回包含指定路径中驱动器名的字符串。 object.GetDriveName(path) 参数 object 必选项。应为 FileSystemObjec
VBS教程：方法-GetDrive 方法
GetDrive 方法返回与指定的路径中驱动器相对应的 Drive 对象。 object.GetDrive drivespec 参数 object 必选项。应为 FileSystemO
VBS教程：方法-GetBaseName 方法
GetBaseName 方法返回字符串，其中包含文件的基本名 (不带扩展名), 或者提供的路径说明中的文件夹。 object.GetBaseName(path) 参数 object 必
VBS教程：方法-GetAbsolutePathName 方法
GetAbsolutePathName 方法从提供的指定路径中返回完整且含义明确的路径。 object.GetAbsolutePathName(pathspec) 参数 object
VBS教程：方法-FolderExists 方法
FolderExists 方法如果指定的文件夹存在，则返回 True；否则返回 False。 object.FolderExists(folderspec) 参数 object 必选项
VBS教程：方法-FileExists 方法
FileExists 方法如果指定的文件存在返回 True；否则返回 False。 object.FileExists(filespec) 参数 object 必选项。应为 FileS

首页

博学

6Ren·AI

商城

python - 有加速暴力 'tally' 算法的替代方法吗？