gpt4 book ai didi

python - 根据另一个数据帧的 lower_bound 和 upper_bound 删除每列的离群值

转载 作者:行者123 更新时间:2023-12-04 03:26:51 25 4
gpt4 key购买 nike

python 3.8, Pandas 1.2.4

MRE:

a = pd.DataFrame({"mean":[3.3, 2.9, 3.2, 5, 3.7, 5.3,5.8, 5.7],
"lower_bound":[1, 1, 1, 2, 3, 3, 4, 5],
"upper_bound":[4, 4, 6, 7, 8, 9, 9, 9]})

data = pd.DataFrame({0:[3,2,4,3,0,5,5,3,1,2,3,4,5,6],
1:[1,3,2,4,5,5,0,6,3,4,2,1,2,3],
2:[3,4,2,5,5,4,2,4,3,2,1,2,3,5],
3:[1,1,2,3,4,3,9,7,6,7,6,7,7,7],
4:[3,2,2,2,1,2,3,4,6,4,6,8,9,0],
5:[2,4,5,3,4,6,7,5,3,4,7,8,9,7],
6:[3,4,6,6,5,5,7,6,5,7,4,7,8,8],
7:[3,4,5,6,6,6,8,7,5,7,5,6,7,5]})

对于 data 数据框中的每一列,如果它不在 [lower_bound, upper_bound] 的范围内,我想将其设为 NaN

我的期望:

    0   1   2   3   4   5   6   7
0 3 1 3 NaN ..
1 2 3 4 NaN ..
2 4 2 2 2
3 3 4 5 3 ..
4 NaN NaN 5 4
5 NaN NaN 4 3 .. ..

提前致谢。

编辑:+@用数据框中的平均值替换 [lower_bound, upper_bound] 之外的值。

最佳答案

让我们尝试使用 where :

import pandas as pd
import numpy as np

a = pd.DataFrame({"mean": [3.3, 2.9, 3.2, 5, 3.7, 5.3, 5.8, 5.7],
"lower_bound": [1, 1, 1, 2, 3, 3, 4, 5],
"upper_bound": [4, 4, 6, 7, 8, 9, 9, 9]})

data = pd.DataFrame({0: [3, 2, 4, 3, 0, 5, 5, 3, 1, 2, 3, 4, 5, 6],
1: [1, 3, 2, 4, 5, 5, 0, 6, 3, 4, 2, 1, 2, 3],
2: [3, 4, 2, 5, 5, 4, 2, 4, 3, 2, 1, 2, 3, 5],
3: [1, 1, 2, 3, 4, 3, 9, 7, 6, 7, 6, 7, 7, 7],
4: [3, 2, 2, 2, 1, 2, 3, 4, 6, 4, 6, 8, 9, 0],
5: [2, 4, 5, 3, 4, 6, 7, 5, 3, 4, 7, 8, 9, 7],
6: [3, 4, 6, 6, 5, 5, 7, 6, 5, 7, 4, 7, 8, 8],
7: [3, 4, 5, 6, 6, 6, 8, 7, 5, 7, 5, 6, 7, 5]})

mask = (a['lower_bound'] <= data) & (data <= a['upper_bound'])
data = data.where(mask, np.nan)
print(data)

输出:

      0    1  2    3    4    5    6    7
0 3.0 1.0 3 NaN 3.0 NaN NaN NaN
1 2.0 3.0 4 NaN NaN 4.0 4.0 NaN
2 4.0 2.0 2 2.0 NaN 5.0 6.0 5.0
3 3.0 4.0 5 3.0 NaN 3.0 6.0 6.0
4 NaN NaN 5 4.0 NaN 4.0 5.0 6.0
5 NaN NaN 4 3.0 NaN 6.0 5.0 6.0
6 NaN NaN 2 NaN 3.0 7.0 7.0 8.0
7 3.0 NaN 4 7.0 4.0 5.0 6.0 7.0
8 1.0 3.0 3 6.0 6.0 3.0 5.0 5.0
9 2.0 4.0 2 7.0 4.0 4.0 7.0 7.0
10 3.0 2.0 1 6.0 6.0 7.0 4.0 5.0
11 4.0 1.0 2 7.0 8.0 8.0 7.0 6.0
12 NaN 2.0 3 7.0 NaN 9.0 8.0 7.0
13 NaN 3.0 5 7.0 NaN 7.0 8.0 5.0

编辑:替换为 mean 选项:

mask = (a['lower_bound'] <= data) & (data <= a['upper_bound'])
data = data.where(mask, a['mean'], axis=1)

输出:

      0    1  2  3    4    5    6    7
0 3.0 1.0 3 5 3.0 5.3 5.8 5.7
1 2.0 3.0 4 5 3.7 4.0 4.0 5.7
2 4.0 2.0 2 2 3.7 5.0 6.0 5.0
3 3.0 4.0 5 3 3.7 3.0 6.0 6.0
4 3.3 2.9 5 4 3.7 4.0 5.0 6.0
5 3.3 2.9 4 3 3.7 6.0 5.0 6.0
6 3.3 2.9 2 5 3.0 7.0 7.0 8.0
7 3.0 2.9 4 7 4.0 5.0 6.0 7.0
8 1.0 3.0 3 6 6.0 3.0 5.0 5.0
9 2.0 4.0 2 7 4.0 4.0 7.0 7.0
10 3.0 2.0 1 6 6.0 7.0 4.0 5.0
11 4.0 1.0 2 7 8.0 8.0 7.0 6.0
12 3.3 2.0 3 7 3.7 9.0 8.0 7.0
13 3.3 3.0 5 7 3.7 7.0 8.0 5.0

关于python - 根据另一个数据帧的 lower_bound 和 upper_bound 删除每列的离群值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67443527/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com