gpt4 book ai didi

python - 在 Pandas 数据框中搜索和删除重复的时间序列数据

转载 作者:行者123 更新时间:2023-11-28 17:26:38 24 4
gpt4 key购买 nike

我有一个数据源,我可以调用它来获取金融工具的日终时间序列数据。

例如,假设我有以下金融工具的数据已经编码并从数据源检索:

price_a

[24.74636733, 29.65460993, 28.09686357, 16.24366395, 27.26716605, 17.1444073, 18.76608861, 17.68487362, 19.5026825, 25.62365151, 12.92619601, 25.66759065, 24.40646289, 15.61753458, 13.82584258, 27.2508518, 12.22547517, 24.2317834, 13.33257932, 28.18551972, 19.11053867, 10.43027953, 21.18221807, 15.1889216, 27.65876136, 16.72982501, 14.0134465, 22.68824162, 19.14317233, 13.57868721]

price_b

[21.01623084, 27.6426434, 20.16877846, 27.41341083, 23.39068249, 20.65973567, 28.11032189, 21.85843902, 20.26838929, 28.52493215, 24.11865407, 28.30861237, 20.51648305, 21.75927511, 21.82957788, 25.4647031, 25.4647031, 25.4647031, 25.4647031, 25.4647031, 25.4647031, 25.4647031, 25.4647031, 21.5721344, 20.41526114, 24.24593747, 25.23109812, 26.11780617, 25.13995547, 25.2511254]

days

['2016-06-01', '2016-06-02', '2016-06-03', '2016-06-04', '2016-06-05', '2016-06-06', '2016-06-07', '2016-06-08', '2016-06-09', '2016-06-10', '2016-06-11', '2016-06-12', '2016-06-13', '2016-06-14', '2016-06-15', '2016-06-16', '2016-06-17', '2016-06-18', '2016-06-19', '2016-06-20', '2016-06-21', '2016-06-22', '2016-06-23', '2016-06-24', '2016-06-25', '2016-06-26', '2016-06-27', '2016-06-28', '2016-06-29', '2016-06-30']

price_b 有重复数据。假设我知道重复数据只出现在 8 个或更多的集合中(例如在示例中),并且任何少于 8 次的重复都是偶然发生的。有什么方法可以检测 price_b 中的重复集合,然后使用重复集合的索引,从该集合中删除索引的 price_a 数据?

预期输出:

price_a

[24.74636733, 29.65460993, 28.09686357, 16.24366395, 27.26716605, 17.1444073, 18.76608861, 17.68487362, 19.5026825, 25.62365151, 12.92619601, 25.66759065, 24.40646289, 15.61753458, 13.82584258, 27.2508518, 15.1889216, 27.65876136, 16.72982501, 14.0134465, 22.68824162, 19.14317233, 13.57868721]

price_b

[21.01623084, 27.6426434, 20.16877846, 27.41341083, 23.39068249, 20.65973567, 28.11032189, 21.85843902, 20.26838929, 28.52493215, 24.11865407, 28.30861237, 20.51648305, 21.75927511, 21.82957788, 25.4647031, 21.5721344, 20.41526114, 24.24593747, 25.23109812, 26.11780617, 25.13995547, 25.2511254]

['2016-06-01', '2016-06-02', '2016-06-03', '2016-06-04', '2016-06-05', '2016-06-06', '2016-06-07', '2016-06-08', '2016-06-09', '2016-06-10', '2016-06-11', '2016-06-12', '2016-06-13', '2016-06-14', '2016-06-15', '2016-06-16', '2016-06-24', '2016-06-25', '2016-06-26', '2016-06-27', '2016-06-28', '2016-06-29', '2016-06-30']

最佳答案

price_a = [24.74636733, 29.65460993, 28.09686357, 16.24366395, 27.26716605,
17.1444073, 18.76608861, 17.68487362, 19.5026825, 25.62365151,
12.92619601, 25.66759065, 24.40646289, 15.61753458, 13.82584258,
27.2508518, 12.22547517, 24.2317834, 13.33257932, 28.18551972,
19.11053867, 10.43027953, 21.18221807, 15.1889216, 27.65876136,
16.72982501, 14.0134465, 22.68824162, 19.14317233, 13.57868721]

price_b = [21.01623084, 27.6426434, 20.16877846, 27.41341083, 23.39068249,
20.65973567, 28.11032189, 21.85843902, 20.26838929, 28.52493215,
24.11865407, 28.30861237, 20.51648305, 21.75927511, 21.82957788,
25.4647031, 25.4647031, 25.4647031, 25.4647031, 25.4647031,
25.4647031, 25.4647031, 25.4647031, 21.5721344, 20.41526114,
24.24593747, 25.23109812, 26.11780617, 25.13995547, 25.2511254]

days = ['2016-06-01', '2016-06-02', '2016-06-03', '2016-06-04', '2016-06-05',
'2016-06-06', '2016-06-07', '2016-06-08', '2016-06-09', '2016-06-10',
'2016-06-11', '2016-06-12', '2016-06-13', '2016-06-14', '2016-06-15',
'2016-06-16', '2016-06-17', '2016-06-18', '2016-06-19', '2016-06-20',
'2016-06-21', '2016-06-22', '2016-06-23', '2016-06-24', '2016-06-25',
'2016-06-26', '2016-06-27', '2016-06-28', '2016-06-29', '2016-06-30']

df = pd.DataFrame(dict(
a=price_a,
b=price_b,
), pd.to_datetime(days))

partitions = (df.b.diff() != 0).cumsum()
vc = partitions.value_counts()
vc8 = vc[vc == 8].index

df[~partitions.isin(vc8)]

enter image description here

关于python - 在 Pandas 数据框中搜索和删除重复的时间序列数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38409805/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com