gpt4 book ai didi

python - 删除重复数据Python

转载 作者:行者123 更新时间:2023-12-01 01:11:11 24 4
gpt4 key购买 nike

我有一个关于房间内网格流分布的巨大数据库。但问题是网格太小,所以其中的某些部分是无用的,并且使我的计算变得困难。在我的 y 维度上,每个网格长度为 0.00032。我的 y 维度从 0 到 0.45。正如你所理解的,有很多无用的数据。

我想通过删除不能被 0.00128 整除的行来使每个网格长度等于 0.00128,该怎么做?

trainProcessed = trainProcessed[trainProcessed[:,4]%0.00128==0]

我尝试过这行代码(trainProcessed 是我的 numpy 数组数据),但它的结果是 0 -> 0.00128 -> 0.00256 -> 0.00512。但有些行的值为 0.00384 并且也可以被 0.00128 整除。顺便说一句,数组形状是(888300,8)。

示例数据:

X: [0,0,0,0,0.00031999,0.00031999,0.00063999,0.00064,0.00096,0.00096,0.000128,0.000128]

示例输出:

X: [0,0,0,0,0.000128,0.000128]

最佳答案

对于这种情况和函数模,我将使用小数:

import pandas as pd
from decimal import Decimal
df = pd.DataFrame({'values': [0.00128, 0.00384, 0.367, 0.128, 0.34]})
print(df)

#convert float to str then Decimal and apply the modulo
#keep only rows which are dividable by 0.00128
filter = df.apply(lambda r: Decimal(str(r['values'])) % Decimal('0.00128') == Decimal('0') ,axis=1)

#if data are smaller you could multiply by power of 10 before modulo
#filter = df.apply(lambda r: Decimal(str(r['values'] * 1000)) % Decimal('0.00128') == Decimal('0') ,axis=1)
df=df[filter].reset_index(drop=True)

#the line: df=df[~filter].reset_index(drop=True) does the (not filter)
print(df)

初始输出:

    values
0 0.00128
1 0.00384
2 0.36700
3 0.12800
4 0.34000

最终输出

    values
0 0.00128
1 0.00384
2 0.12800

关于python - 删除重复数据Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54855279/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com