gpt4 book ai didi

performance - 为什么在本例中使用序列比使用列表慢得多

转载 作者:行者123 更新时间:2023-12-03 12:04:33 24 4
gpt4 key购买 nike

背景:
我有一系列连续的,带时间戳的数据。
数据序列中有漏洞,有的很大,有的只是一个缺失值。
每当孔只是一个缺失值时,我都想使用虚拟值修补孔(较大的孔将被忽略)。

我想使用延迟生成的修补序列,因此使用了Seq.unfold。

我制作了两种方法来修补数据中的漏洞。

第一个使用带孔的数据的序列数据,并生成修补的序列。这就是我想要的,但是当输入序列中的元素数增加到1000以上时,这些方法的运行速度会非常慢,输入序列包含的元素越多,它的性能就会越差。

第二种方法使用带有孔的数据的列表,并生成修补的序列,并且运行速度很快。但是,这不是我想要的,因为这会强制在内存中实例化整个输入列表。

我想使用(sequence-> sequence)方法而不是(list-> sequence)方法,以避免同时将整个输入列表存储在内存中。

问题:

1)为什么第一种方法这么慢(随着输入列表的增加,情况变得越来越糟)
(我怀疑这与用Seq.skip 1重复创建新序列有关,但我不确定)

2)如何使用输入序列而不是输入列表来快速修补数据中的孔?

编码:

open System

// Method 1 (Slow)
let insertDummyValuesWhereASingleValueIsMissing1 (timeBetweenContiguousValues : TimeSpan) (values : seq<(DateTime * float)>) =
let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
(None, values) |> Seq.unfold (fun (prevValue, restOfValues) ->
if restOfValues |> Seq.isEmpty then
None // Reached the end of the input seq
else
let currentValue = Seq.hd restOfValues
if prevValue.IsNone then
Some(currentValue, (Some(currentValue), Seq.skip 1 restOfValues )) // Only happens to the first item in the seq
else
let currentTime = fst currentValue
let prevTime = fst prevValue.Value
let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) // 42 is chosen here for obvious reasons, making this comment superfluous
Some(dummyValue, (Some(dummyValue), restOfValues))
else
Some(currentValue, (Some(currentValue), Seq.skip 1 restOfValues))) // Either the two values were contiguous, or the gap between them was too large to patch

// Method 2 (Fast)
let insertDummyValuesWhereASingleValueIsMissing2 (timeBetweenContiguousValues : TimeSpan) (values : (DateTime * float) list) =
let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
(None, values) |> Seq.unfold (fun (prevValue, restOfValues) ->
match restOfValues with
| [] -> None // Reached the end of the input list
| currentValue::restOfValues ->
if prevValue.IsNone then
Some(currentValue, (Some(currentValue), restOfValues )) // Only happens to the first item in the list
else
let currentTime = fst currentValue
let prevTime = fst prevValue.Value
let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0)
Some(dummyValue, (Some(dummyValue), currentValue::restOfValues))
else
Some(currentValue, (Some(currentValue), restOfValues))) // Either the two values were contiguous, or the gap between them was too large to patch

// Test data
let numbers = {1.0..10000.0}
let contiguousTimeStamps = seq { for n in numbers -> DateTime.Now.AddMinutes(n)}

let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items

let timeBetweenContiguousValues = (new TimeSpan(0,1,0))

// The fast sequence-patching (method 2)
dataWithOccationalHoles |> List.of_seq |> insertDummyValuesWhereASingleValueIsMissing2 timeBetweenContiguousValues |> Seq.iter (fun pair -> printfn "%f %s" (snd pair) ((fst pair).ToString()))

// The SLOOOOOOW sequence-patching (method 1)
dataWithOccationalHoles |> insertDummyValuesWhereASingleValueIsMissing1 timeBetweenContiguousValues |> Seq.iter (fun pair -> printfn "%f %s" (snd pair) ((fst pair).ToString()))

最佳答案

Seq.skip构造一个新序列。我认为这就是为什么您的原始方法缓慢的原因。

我的第一个倾向是使用序列表达式和Seq.pairwise。这是快速且易于阅读的。

let insertDummyValuesWhereASingleValueIsMissingSeq (timeBetweenContiguousValues : TimeSpan) (values : seq<(DateTime * float)>) =
let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal
seq {
yield Seq.hd values
for ((prevTime, _), ((currentTime, _) as next)) in Seq.pairwise values do
let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime)
if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then
let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) // 42 is chosen here for obvious reasons, making this comment superfluous
yield dummyValue
yield next
}

关于performance - 为什么在本例中使用序列比使用列表慢得多,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1306140/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com