gpt4 book ai didi

python - 找不到我的数据集的异常值(更具体地说是 IQR)

转载 作者:太空宇宙 更新时间:2023-11-03 19:43:28 24 4
gpt4 key购买 nike

尝试在 python 中使用 pandas 查找 Excel 工作表的异常值。我能够找到第一个和第三个四分位数,但无法在不返回 NaN 的情况下减去另一个四分位数。

这是基本代码:

absent = pd.read_excel('Absenteeism_at_work.xls')

print("\nOUTLIERS:")
# q1 = (absent.loc[:741, ['Distance from Residence to Work']].quantile([0.25]))
# q3 = (absent.loc[:741, ['Distance from Residence to Work']].quantile([0.75]))

#print(absent.loc[:741, 'Distance from Residence to Work'].quantile([0.25])) #quartile

#print(q1)
# q1, q3 = absent.loc[:741, ['Distance from Residence to Work', 'Transportation expense', 'Month of absence',
# 'Social smoker', 'Social drinker', 'Education']].quantile([0.25 - 0.75])

print(absent.loc[:741, ['Distance from Residence to Work', 'Transportation expense', 'Month of absence',
'Social smoker', 'Social drinker', 'Education']].quantile([0.75])
- absent.loc[:741, ['Distance from Residence to Work', 'Transportation expense', 'Month of absence',
'Social smoker', 'Social drinker', 'Education']].quantile([0.25]))

输出:

OUTLIERS:
Distance from Residence to Work Transportation expense \
0.25 NaN NaN
0.75 NaN NaN

Month of absence Social smoker Social drinker Education
0.25 NaN NaN NaN NaN
0.75 NaN NaN NaN NaN

最佳答案

  1. 您的代码只是简单的四分位数范围计算。如果它对你有用,那就太好了。如果您需要真正的异常值检测,这比基于四分位数的模式更复杂,尤其是多变量,您可以求助于 python 包,如 sklearn 或 pyod。

  2. 使用分位数函数,您需要清理原始数据以确保它只是数字。特别是,您导入 Excel 文件作为数据源。

  3. 检查数据

    tmp_df=absent.iloc[:741]

    cols = ['居住地到工作地点的距离', '交通费', “缺席的一个月”, “社交吸烟者”, “社交饮酒者”, ‘教育’]

    打印(tmp_df[col].quantile(0.25,0.75))

    打印(tmp_df[col].describe(include='all'))

祝你好运。

怀俄明州

关于python - 找不到我的数据集的异常值(更具体地说是 IQR),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60291380/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com