gpt4 book ai didi

python - 使用 Python/numpy 过滤 CSV 数据

转载 作者:行者123 更新时间:2023-11-28 22:51:58 24 4
gpt4 key购买 nike

我正在处理 CSV 文件。

            id     gender       disease       read      write    science 
1. 11 male cancer, diabetes 34 46 39
2. 20 male diabetes 60 52 61
3. 12 male diabetes 37 44 39
4. 16 male cancer 47 31 36
5. 7 male diabetes 57 54 47
6. 21 male diabetes 44 44 50
7. 15 male diabetes 39 39 26
8. 22 male diabetes 42 39 56
9. 9 male cancer 48 49 44
10. 18 male diabetes 50 33 44
11. 5 male diabetes 47 40 .
12. 14 male diabetes 47 41 42
13. 3 male diabetes 63 65 63
14. 24 male fever 52 62 47
15. 8 female diabetes 39 44 44
16. 1 female cancer 34 44 39
17. 4 female diabetes 44 50 39
18. 2 female diabetes 39 41 42
19. 19 female cancer 28 46 44
20. 17 female diabetes 47 57 44
21. 6 female diabetes 47 41 40
22. 10 female diabetes 47 54 53
23. 13 female diabetes 47 46 47
24. 23 female diabetes 65 65 58
25. 25 female Breast cancer 47 44 42

我想获取人们患有癌症的所有行。有些人患有糖尿病和癌症,因此我也必须对其进行过滤。结果应该是:

1.         11       male      cancer, diabetes 34         46         39  
4. 16 male cancer 47 31 36
9. 9 male cancer 48 49 44
19. 19 female cancer 28 46 44
25. 25 female Breast cancer 47 44 42


import pandas as pd
import numpy as np

ppl_ve_cancer = pd.read_csv(join(dirname(__file__), 'data.csv'))
delta= pd.DataFrame.from_records(ppl_ve_cancer )
disease= delta['disease']

现在,我如何过滤“疾病列表”,过滤后,我如何获取他们行中的数据(id,gender,read,write,science)

最佳答案

这里有一个更以 pandas 为中心的方法:首先,您将所有数据作为数据框读取,创建一个 has cancer 列,然后对其进行过滤=

import StringIO
import pandas

datastring = StringIO.StringIO("""\
id,gender,disease,read,write,science
11,male,"cancer,diabetes",34,46,39
20,male,diabetes,60,52,61
12,male,diabetes,37,44,39
16,male,cancer,47,31,36
7,male,diabetes,57,54,47
21,male,diabetes,44,44,50
15,male,diabetes,39,39,26
22,male,diabetes,42,39,56
9,male,cancer,48,49,44
18,male,diabetes,50,33,44
5,male,diabetes,47,40,-999
14,male,diabetes,47,41,42
3,male,diabetes,63,65,63
24,male,fever,52,62,47
8,female,diabetes,39,44,44
1,female,cancer,34,44,39
4,female,diabetes,44,50,39
2,female,diabetes,39,41,42
19,female,cancer,28,46,44
17,female,diabetes,47,57,44
6,female,diabetes,47,41,40
10,female,diabetes,47,54,53
13,female,diabetes,47,46,47
23,female,diabetes,65,65,58
25,female,"Breast cancer",47,44,42
""")

df = pandas.read_csv(datastring, na_values=-999)

# create the `has cancer` column
df['has cancer'] = df.disease.apply(lambda row: 'cancer' in row)

# print the filtered data
print(df[df['has cancer']].to_string())


id gender disease read write science has cancer
0 11 male cancer,diabetes 34 46 39 True
3 16 male cancer 47 31 36 True
8 9 male cancer 48 49 44 True
15 1 female cancer 34 44 39 True
18 19 female cancer 28 46 44 True
24 25 female Breast cancer 47 44 42 True

关于python - 使用 Python/numpy 过滤 CSV 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21011571/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com