gpt4 book ai didi

python - 过滤掉包含特定字符串的数据帧的行

转载 作者:行者123 更新时间:2023-12-04 14:55:25 25 4
gpt4 key购买 nike

我有一个庞大的数据框。数据框有 patient.drug 列。此列包含字典列表作为其元素。我想过滤掉 patient.drug 列中所有包含“NIFEDIPINE”字样的行。

数据框非常大。这是它的一个示例。

                                                         patient.drug
0 [{'drugcharacterization': '1', 'medicinalproduct': 'PANDOL'}]
1 [{'drugcharacterization': '2', 'medicinalproduct': 'NIFEDIPINE'}]
2 [{'drugcharacterization': '3', 'medicinalproduct': 'SIMVASTATIN'}]
3 [{'drugcharacterization': '4', 'medicinalproduct': 'NIFEDIPINE'}]

到目前为止,我已经尝试过了

df[df['patient.drug'].str.contains('NIFEDIPINE')]

但它给我一个错误。

 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              ...\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64', length=12000)] are in the [columns]"

我也尝试过使用 in 运算符并遍历行。

lst=[]
for i in range(len(df)):
if 'NIFEDIPINE' in df.loc[i, "patirnt.drug"]:
lst.append(i)
print(lst)

这也给我一个错误。我应该怎么做才能正确处理?

最佳答案

复制您的数据后,

>>> df
patient.drug
0 [{'drugcharacterization': '1', 'medicinalproduct': 'PANDOL'}]
1 [{'drugcharacterization': '2', 'medicinalproduct': 'NIFEDIPINE'}]
2 [{'drugcharacterization': '3', 'medicinalproduct': 'SIMVASTATIN'}]
3 [{'drugcharacterization': '3', 'medicinalproduct': 'SIMVASTATIN'}]
4 [{'drugcharacterization': '4', 'medicinalproduct': 'NIFEDIPINE'}]

在使用您的代码时:

>>> df[df['patient.drug'].str.contains('NIFEDIPINE')]

错误:

    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Float64Index([nan, nan, nan, nan, nan], dtype='float64')] are in the [columns]"

解决方案:

    >>> df[df['patient.drug'].astype('str').str.contains('NIFEDIPINE')]
patient.drug
1 [{'drugcharacterization': '2', 'medicinalproduct': 'NIFEDIPINE'}]
4 [{'drugcharacterization': '4', 'medicinalproduct': 'NIFEDIPINE'}]

注意:

这是由于 indexer 检查 pandas indexer.py 部分而引发的问题,如下所示:

--> pandas/core/indexing.py

# Count missing values:
missing_mask = indexer < 0
missing = (missing_mask).sum()

if missing:
if missing == len(indexer):
axis_name = self.obj._get_axis_name(axis)
raise KeyError(f"None of [{key}] are in the [{axis_name}]")

# We (temporarily) allow for some missing keys with .loc, except in
# some cases (e.g. setting) in which "raise_missing" will be False

关于python - 过滤掉包含特定字符串的数据帧的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68126025/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com