gpt4 book ai didi

python - 用 pandas 中的上限值和下限值替换列的元素(如果连续值相差 10 )

转载 作者:太空宇宙 更新时间:2023-11-03 20:52:34 25 4
gpt4 key购买 nike

我有一个包含温度列的数据框。在某些行的温度列中,连续值相差超过 10,我想清理我的数据集。我想用上限和下限的平均值替换该值。

我尝试了一些条件替换,但不起作用......

df.loc[df['Temperature1'] > 50, 'Temperature'] = 23

我已经尝试过这个,但这会将 50 以上的所有元素更改为 23..但我想比较两行并检查差异是否大于 10,而我只需要替换..

最佳答案

编辑:添加了示例 rolling window (另请参阅:window functions)

<小时/>

您可以使用shift()将上行和下行的值放入中间行。

import pandas as pd

df = pd.DataFrame({'Temperature': [10,30,20,40,50]})

df['upper_row'] = df['Temperature'].shift()
df['lower_row'] = df['Temperature'].shift(-1)

print(df)

结果

   Temperature  upper_row  lower_row
0 10 NaN 30.0
1 30 10.0 20.0
2 20 30.0 40.0
3 40 20.0 50.0
4 50 40.0 NaN

然后你在一行中有三个值,你通常可以将它们相减、计算平均值、比较它们等

df['difference'] = (df['Temperature'] - df['upper_row']).abs()
df['mean'] = (df['upper_row'] + df['lower_row'])/2

print(df)

结果

   Temperature  upper_row  lower_row  difference  mean
0 10 NaN 30.0 NaN NaN
1 30 10.0 20.0 20.0 15.0
2 20 30.0 40.0 10.0 35.0
3 40 20.0 50.0 20.0 35.0
4 50 40.0 NaN 10.0 NaN

您可以替换Temperature中的值

df['Temperature'][ df['difference']>10 ] = df['mean']

print(df)

结果

   Temperature  upper_row  lower_row  difference  mean
0 10 NaN 30.0 NaN NaN
1 15 10.0 20.0 20.0 15.0
2 20 30.0 40.0 10.0 35.0
3 35 20.0 50.0 20.0 35.0
4 50 40.0 NaN 10.0 NaN
<小时/>

完整示例:

import pandas as pd

df = pd.DataFrame({'Temperature': [10,30,20,40,50]})

df['upper_row'] = df['Temperature'].shift()
df['lower_row'] = df['Temperature'].shift(-1)
print(df)

df['difference'] = (df['Temperature'] - df['upper_row']).abs()
df['mean'] = (df['upper_row'] + df['lower_row'])/2
print(df)

df['Temperature'][ df['difference']>10 ] = df['mean']
print(df)
<小时/>

编辑:您还可以使用rolling window处理两个或三个连续的行。请参阅代码中的注释。

import pandas as pd

df = pd.DataFrame({'Temperature': [10,30,20,40,50]})

# work with two consecutive rows and result assign to last row
rw2 = df['Temperature'].rolling(2)
df['difference'] = rw2.apply(lambda rows:abs(rows[1] - rows[0]), raw=True)

# work with three consecutive rows and result assign to middle/center row
rw3 = df['Temperature'].rolling(3, center=True)
df['mean'] = rw3.apply(lambda rows:(rows[0] + rows[2])/2, raw=True)

print(df)

df['Temperature'][ df['difference']>10 ] = df['mean']
print(df)

关于python - 用 pandas 中的上限值和下限值替换列的元素(如果连续值相差 10 ),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56215155/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com