gpt4 book ai didi

python - 选择至少一列中的值为负数的行

转载 作者:太空宇宙 更新时间:2023-11-04 04:25:04 26 4
gpt4 key购买 nike

给定一个 DataFrame:

df = pd.DataFrame(
{'AgeAtMedStart': {1: -46.47, 2: 46.47, 3: 46.8, 4: 51.5, 5: 51.5},
'AgeAtMedStop': {1: 46.8, 2: 46.8, 3: nan, 4: -51.9, 5: 51.81},
'MedContinuing': {1: 'No', 2: 'No', 3: 'Yes', 4: 'No', 5: 'No'},
'Medication': {1: 'Med1', 2: 'Med2', 3: 'Med3', 4: 'Med4', 5: 'Med4'},
'YearOfMedStart': {1: 2016.0, 2: 2016.0, 3: 2016.0, 4: 2016.0, 5: 2016.0}}
)

df
AgeAtMedStart AgeAtMedStop MedContinuing Medication YearOfMedStart
1 -46.47 46.80 No Med1 2016.0
2 46.47 46.80 No Med2 2016.0
3 46.80 NaN Yes Med3 2016.0
4 51.50 -51.90 No Med4 2016.0
5 51.50 51.81 No Med4 2016.0

我想过滤以保留“AgeAt*”列中任何数值为负数的行。

此输出的预期输出是索引为 1 的行,因为“AgeAtMedStart”的值为 -46.47,索引为 4 的行,因为“AgeAtMedStop”为 -51.9,因此输出将是

   AgeAtMedStart  AgeAtMedStop MedContinuing Medication  YearOfMedStart
1 -46.47 46.8 No Med1 2016.0
4 51.50 -51.9 No Med4 2016.0

编辑1:

所以我尝试了迄今为止提供的不同答案,但都返回一个空数据框。我相信部分问题是我有另一个名为 AgeAtMedStartFlag(和 AgeAtMedStopFlag)的列,其中包含字符串。所以对于这个示例 csv:

RecordKey   Medication  CancerSiteForTreatment  CancerSiteForTreatmentCode  TreatmentLineCodeKey    AgeAtMedStart   AgeAtMedStartFlag   YearOfMedStart  MedContinuing   AgeAtMedStop    AgeAtMedStopFlag    ChangeOfTreatment
1 Drug1 Site1 C1.0 First -46.47 Year And Month Are Known But Day Is Missing And Coded To 15 2016 No 46.8 Year And Month Are Known But Day Is Missing And Coded To 15 Yes
1 Drug2 Site2 C1.1 First 46.47 Year And Month Are Known But Day Is Missing And Coded To 15 2016 No 46.8 Year And Month Are Known But Day Is Missing And Coded To 15 Yes
1 Drug3 Site3 C1.2 First 46.8 Year And Month Are Known But Day Is Missing And Coded To 15 2016 Yes Yes
2 Drug4 Site4 C1.3 First 51.5 2016 No 51.9 Yes
2 Drug5 Site5 C1.4 First 51.5 2016 No -51.81 Yes
3 Drug6 Site6 C1.5 First 73.93 2016 No 74.42 Yes
3 Drug7 Site7 C1.6 First 73.93 2016 No 74.42 Yes
4 Drug8 Site8 C1.7 First 36.66 2015 No 37.24 Yes
4 Drug9 Site9 C1.8 First 36.66 2015 No 37.24 Yes
4 Drug10 Site10 C1.9 First 36.66 2015 No 37.24 Yes
9 Drug11 Site11 C1.10 First 43.55 2016 No 43.68 Yes
9 Drug12 Site12 C1.11 First 43.22 2016 No 43.49 Yes
9 Drug13 Site13 C1.12 First 43.55 2016 No 43.68 Yes
9 Drug14 Site14 C1.13 First 43.22 2016 No 43.49 Yes
10 Drug15 Site15 C1.14 First 74.42 2016 No 74.84 Yes
10 Drug16 Site16 C1.15 First 73.56 2015 No 73.98 Yes
10 Drug17 Site17 C1.16 First 73.56 2015 No 73.98 No
10 Drug18 Site18 C1.17 First 74.42 2016 No 74.84 No
10 Drug19 Site19 C1.18 First 73.56 2015 No 73.98 No
10 Drug20 Site20 C1.19 First 74.42 2016 No 74.84 No
11 Drug21 Site21 C1.20 First 70.72 2013 No 72.76 No
11 Drug22 Site22 C1.21 First 68.76 2011 No 70.62 No
11 Drug23 Site23 C1.22 First 73.43 2016 No 73.96 No
11 Drug24 Site24 C1.23 First 72.76 2015 No 73.43 No

对我的脚本进行此更改:

age_df = df.columns[(df.columns.str.startswith('AgeAt')) & (~df.columns.str.endswith('Flag'))]

df[df[age_df] < 0].to_excel('invalid.xlsx', 'Benjamin_Button')

它返回:

RecordKey   Medication  CancerSiteForTreatment  CancerSiteForTreatmentCode  TreatmentLineCodeKey    AgeAtMedStart   AgeAtMedStartFlag   YearOfMedStart  MedContinuing   AgeAtMedStop    AgeAtMedStopFlag    ChangeOfTreatment
1 -46.47
1
1
2
2 -51.81
3
3
4
4
4
9
9
9
9
10
10
10
10
10
10
11
11
11
11

我能否修改此实现以仅返回负数所在的行,并在可能的情况下返回这些行的其余值?或者甚至更好,只有负年龄和该行的 RecordKey。

最佳答案

这是为您准备的简单单行代码。如果您需要从逻辑上确定该列是否为数字,请参阅 coldspeed 的答案。但是,如果您对显式列引用没有问题,那么像这样的简单方法就可以了。

请注意,我也在用 0 填充 NaN;即使数据丢失,这也能满足您的要求。 Nan 可以用其他方式处理,但这在这里就足够了。如果您想保留其他列中的缺失值,也可以这样做(为简单起见,我没有在此处包含它)。

myData = df.fillna(0).query('AgeAtMedStart < 0 or AgeAtMedStop < 0')

返回:

   AgeAtMedStart  AgeAtMedStop MedContinuing Medication  YearOfMedStart
1 -46.47 46.8 No Med1 2016.0
4 51.50 -51.9 No Med4 2016.0

Pandas 原生查询方法对于简单的过滤表达式非常方便。

有关更多信息,请参阅文档:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html

关于python - 选择至少一列中的值为负数的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53659027/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com