python - 我可以在哪里改进我的代码以缩短其执行时间？-6ren

python - 我可以在哪里改进我的代码以缩短其执行时间？

转载作者：行者123 更新时间：2023-12-04 12:30:42

25

4

请求from HackerRank :

If the amount spent by a client on a particular day is greater than or equal to 2× the client's median spending for a trailing number of days, they send the client a notification about potential fraud. The bank doesn't send the client any notifications until they have at least that trailing number of prior days' transaction data.

Given the number of trailing days d and a client's total daily expenditures for a period of n days, determine the number of times the client will receive a notification over all n days.

我的代码可以解决问题，但是对于大的测试用例有时间限制。我的代码无法通过时限要求。我的代码实际上很短:

from statistics import median

first_multiple_input = input().rstrip().split()
n = int(first_multiple_input[0])
d = int(first_multiple_input[1])
expenditure = list(map(int, input().rstrip().split()))
count=0
for i in range(len(expenditure)-d):
    if expenditure[d+i] >= 2*median(expenditure[i:d+i]) :
        count+=1
print( count)

请指出造成延迟的原因以及如何改进。

有助于理解代码的小测试用例:

9 5                 expenditure[] size n =9, d = 5
2 3 4 2 3 6 8 4 5   expenditure = [2, 3, 4, 2, 3, 6, 8, 4, 5]

最佳答案

分析/想法

你的 median(expenditure[i:d+i]) 是罪魁祸首，因为 sorting 需要 O(d log d) 时间每次都是大小为 d 的整个未排序切片。您可以通过保留尾随元素的当前窗口将其减少到 O(log d)，例如在 SortedList 中.您从中间的一两个元素获取中值，然后更新，只需添加一个新元素并删除最旧的元素。

实现

from sortedcontainers import SortedList

n = 9
d = 5
expenditure = [2, 3, 4, 2, 3, 6, 8, 4, 5]

count = 0
trailing = SortedList(expenditure[:d])
half = d // 2
for i in range(d, n):
    median = (trailing[half] + trailing[~half]) / 2
    if expenditure[i] >= 2 * median:
        count += 1
    trailing.add(expenditure[i])
    trailing.remove(expenditure[i - d])
print(count)

我们可以省略 /2 和 2 *，但是“median”将是错误的名称，naming things is hard .我们可以做 if expenditure[i] >= trailing[half] + trailing[~half]，但我觉得不太清楚。

输出

如果你添加

    print(f'{trailing=} {median=} {expenditure[i]=}')

在 median = ... 行之后，您可以看到发生了什么:

trailing=SortedList([2, 2, 3, 3, 4]) median=3.0 expenditure[i]=6
trailing=SortedList([2, 3, 3, 4, 6]) median=3.0 expenditure[i]=8
trailing=SortedList([2, 3, 4, 6, 8]) median=4.0 expenditure[i]=4
trailing=SortedList([2, 3, 4, 6, 8]) median=4.0 expenditure[i]=5
2

替代实现

使用 zip 代替索引:

count = 0
trailing = SortedList(expenditure[:d])
half = d // 2
for today, oldest in zip(expenditure[d:], expenditure):
    median = (trailing[half] + trailing[~half]) / 2
    if today >= 2 * median:
        count += 1
    trailing.add(today)
    trailing.remove(oldest)
print(count)

替代数据结构:排序规则列表

我发现了问题at HackerRank ，它没有 sortedcontainers。但是以下内容在那里被接受。

我们可以使用常规的 Python list，但在 Python 标准库中包含的 sorted 和 bisect 的帮助下我们自己对其进行排序:

from bisect import bisect_left, insort

count = 0
trailing = sorted(expenditure[:d])
half = d // 2
for today, oldest in zip(expenditure[d:], expenditure):
    median = (trailing[half] + trailing[~half]) / 2
    if today >= 2 * median:
        count += 1
    insort(trailing, today)
    del trailing[bisect_left(trailing, oldest)]
print(count)

访问中间元素需要 O(1) 时间，查找插入/删除索引需要 O(log d) 时间，实际插入/删除需要 O(d) 时间(因为它需要移位索引右侧的所有元素)。但是 O(d) 的转移速度非常快非常低。

还有两个:排序的字节数组和计数排序

问题最初不包括对 HackerRank 的引用。现在我看到值被限制为 0 到 200 之间的整数，我们也可以使用 bytearray:

trailing = bytearray(sorted(expenditure[:d]))

正如我刚才在讨论中看到的那样，对于这个允许值范围，我们还可以使用一种计数排序形式。我认为 Fenwick tree会让这个特别快，我可能会稍后尝试。

基准

在评论中，您提到 n=200000 和 d=10122 是一个大案例。所以我用这些数据进行了测试:

n = 200000
d = 10122
expenditure = random.choices(range(201), k=n)

我的解决方案的基准:

                       at replit.com   on my weak laptop
SortedList + indexing   ~1.8 seconds    ~6.4 seconds
SortedList + zipping    ~1.8 seconds    ~6.4 seconds
sorted regular list     ~0.6 seconds    ~8.8 seconds
sorted bytearray        ~0.3 seconds    ~1.7 seconds

不确定为什么常规列表解决方案在我的笔记本电脑上相对较慢。我怀疑它超出了我的 CPU 的 1 级缓存。

关于python - 我可以在哪里改进我的代码以缩短其执行时间？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/69281285/

25

4

0

文章推荐： powershell - 使用 PowerShell 启动程序，无需等待输入

文章推荐： sql-server - 如何从两个不同表的两列中排序数据

文章推荐： configuration-files - 为 Deno 发布包时忽略文件

mysql - 如何获取每个日期的最小(时间)和最大(时间)
您好，我是使用 xampp 的 PHPmyadmin 新手，没有 MYSQL 背景。当我喜欢研究它是如何工作的时，我的脑海中浮现出一个想法，它让我一周都无法休眠，因为我似乎无法弄清楚如何使用 MIN(
pointers - 时间.时间 : pointer or value
Go docs say (强调): Programs using times should typically store and pass them as values, not pointers.
MySQL:查找在一个日期(时间)有条目但在另一个日期(时间)没有条目的用户行
我有一组用户在 8 月 1 日有一个条目。我想找到在 8 月 1 日有条目但在 8 月 2 日没有做任何事情的用户。现在是 10 月，所以事件已经过去很久了。我有限的知识说: SELECT * F
json - 时间 JSON 编码为 0 时间
我有以下代码，主要编码和取消编码时间结构。这是代码 package main import ( "fmt" "time" "encoding/json" ) type chec
cpu - 用户 CPU 时间 vs 系统 CPU 时间？
您能详细解释一下“用户 CPU 时间”和“系统 CPU 时间”吗？我读了很多，但我不太理解。最佳答案区别在于时间花在用户空间还是内核空间。用户 CPU 时间是处理器运行程序代码(或库中的代码)所花
profiling - 我应该使用什么分析器来测量_real_ 时间(包括等待系统调用)在此函数中花费，而不是 _CPU_ 时间
应用程序不计算东西，但做输入/输出、读取文件、使用网络。我希望探查器显示它。我希望像 callgrind 中的东西一样，在每个问题中调用 clock_gettime。或者像 oprofile 那样
jQuery 计时器可以在时间 x、时间 y、时间 z 上触发事件吗？
目前我的 web 应用程序接收 websocket 数据来触发操作。这会在页面重新加载时中断，因此我需要一个能够触发特定事件的客户端解决方案。这个想法可行吗？假设你有 TimeX = curre
linux - 找出 JBoss 消耗了多少 cpu 时间、内存和 I/O 时间？
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
java - 将 Joda 时间 Instant 转换为 Java 时间 Instant
我有一个 Instant (org.joda.time.Instant) 的实例，我在一些 api 响应中得到它。我有另一个来自 (java.time.Instant) 的实例，这是我从其他调用中获得
python - 如何集成一个函数 w.r.t 时间；即 'y' 是一个数组，时间(t)的值从 1 到 3000 不等
如何集成功能 f(y) w.r.t 时间;即 'y'是一个包含 3000 个值和值 time(t) 的数组从 1 到 3000 不等。所以，在整合 f(y) 后我需要 3000 个值. 积分将是不确定
时间:如何以编程方式创建命名空间？
可以通过 CLI 创建命名空间，但是如何使用 Java SDK 来创建命名空间？最佳答案它以编程方式通过 gRPC API 完成由服务公开。在 Java 中，生成的 gRPC 客户端可以通过 W
Java日期DST调整我的日期/时间
我有一个函数，它接受 2 组日期(开始日期和结束日期)，这些日期将用于我的匹配引擎我必须知道start_date1和end_date1是否在start_date2和end_date2内快进:当我在
Python运行命令行(时间)
我想从 Python 脚本运行“time”unix 命令，以计算非 Python 应用程序的执行时间。我会使用 os.system 方法。有什么方法可以在Python中保存这个输出吗？我的目标是多次运
时间/日期轴的漂亮图形标签的算法？
我正在寻找一种“漂亮的数字”算法来确定日期/时间值轴上的标签。我熟悉 Paul Heckbert's Nice Numbers algorithm . 我有一个在 X 轴上显示时间/日期的图，用户可以
powershell - 获取格式化的通用日期/时间
在 PowerShell 中，您可以格式化日期以返回当前小时，如下所示: Get-Date -UFormat %H 您可以像这样在 UTC 中获取日期字符串: $dateNow = Get-Date
javascript - 检测子窗口何时加载 "each"时间
我正在尝试使用 Javascript 向父子窗口添加一些页面加载检查功能。我的目标是“从父窗口”检测，每次子窗口完全加载然后执行一些代码。我在父窗口中使用以下代码示例: childPage=wi
FFMPEG Drawtext 时间
我正在尝试设置此 FFmpeg 命令的 drawtext 何时开始，我尝试使用 start_number 但看起来它不会成功。 ffmpeg -i 1.mp4 -acodec aac -keyint_
excel - 将长日期文本转换为日期/时间
我收到了一个 Excel (2010) 电子表格，它基本上是一个文本转储。单元格 - J8 具有以下信息 2014 年 2 月 4 日星期二 00:08:06 EST 单元格 - L8 具有以下信息
excel - 时间/日期未转换
我收到的原始数据包含一列具有以下日期和时间戳格式的数据: 2014 年 3 月 31 日凌晨 3:38 单元格的格式并不一致，因为有些单元格有单个空格，而另一些单元格中有两个或三个字符之间的空格。所以
Grails - 如何在我的应用程序中显示版本和构建日期/时间
我想知道是否有办法在我的 Grails 应用程序顶部显示版本和构建日期。编辑:我应该说我正在寻找构建应用程序的日期/时间。最佳答案在您的主模板中，或任何地方。 Server version:

首页

博学

6Ren·AI

商城