gpt4 book ai didi

python - pandas 中的数据清理 : replacing null values with specific strings if these strings are contained in another column

转载 作者:太空宇宙 更新时间:2023-11-03 19:41:28 25 4
gpt4 key购买 nike

我目前正在研究汽车排放数据集,我想在其中清理/标准化汽车型号名称。数据集相当大,但这里是前 10 行:

cars_em_df = pd.DataFrame({'manufacturer_name_mapped': ['FIAT', 'FIAT','FIAT','FIAT','FIAT','BMW AG','BMW AG','BMW AG','BMW AG','BMW AG'],
'commercial_name':['124 gt multiair auto', '500l wagon pop star t-jet',
'doblo combi 1.4 95', 'panda 0.9t sge 85 natural power', 'punto 1.4 77 lpg', 'x4 xdrive20d se auto', '216d active tourer b37 f45','220d gran tourer b47 f46','x1 xdrive18d sport','320i xdrive m sport gt auto'],
'fuel_type_mapped':['Petrol', 'Petrol', 'Petrol', 'NG-Biomethane', 'LPG','Diesel','Diesel','Diesel','Diesel','Petrol'],
'file_year':[2018, 2018, 2018, 2018, 2018,2018, 2018, 2018, 2018, 2018], 'emissions': [153,158,165,86,114,131,166,200,151,149], 'commercial_name_cleaned':['124','500',None,'panda','punto','x4',None,None,'x1',None]})

右侧列'commercial_name_cleaned'是我第一次清理练习的结果,其中我将'commercial_name'列中的名称与标准化列表相匹配来自不同来源的名称。如您所见,这些名称非常简单且简短。每当我无法匹配模型名称时,我的函数就会返回“无”。

作为第二步,我现在想要执行以下操作:如果为“无”,则在相邻的“commercial_name”列中搜索特定字符串并将其替换为型号名称 I指定的。我试过这个:

    def str_ops(commercial_name_cleaned,commercial_name):
if commercial_name_cleaned == None:
if '216' in commercial_name:
return '2-series'
elif '220' in commercial_name:
return '2-series'
elif '320' in commercial_name:
return '3-series'

然后我会将此函数应用于数据框:

cars_em_df['commercial_name_cleaned'] = cars_em_df.apply(lambda x: str_ops(str(x.commercial_name_cleaned), str(x.commercial_name)), axis=1)

需要注意的是,如果在 'commercial_name' 中找不到“320”或“220”等,该函数不应更改任何内容,而只返回 中已有的值>“commercial_name_cleaned”。但是,当我应用该函数时,整个 'commercial_name_cleaned' 列就变成“无”值。所以肯定是函数有问题。有人知道如何解决这个问题吗?

非常感谢您的帮助,谢谢!

最佳答案

您在 commercial_name_cleaned 列中获得 None 值,因为您没有从函数 str_ops 返回任何内容,当您不这样做时t 显式返回任何内容,隐式返回 None 类型。

替换:

def str_ops(commercial_name_cleaned,commercial_name):
if commercial_name_cleaned == None:
if '216' in commercial_name:
return '2-series'
elif '220' in commercial_name:
return '2-series'
elif '320' in commercial_name:
return '3-series'

与:

def str_ops(commercial_name_cleaned,commercial_name):
if commercial_name_cleaned == 'None':
if '216' in commercial_name:
return '2-series'
elif '220' in commercial_name:
return '2-series'
elif '320' in commercial_name:
return '3-series'
else:
return commercial_name_cleaned

输出:

manufacturer_name_mapped                   commercial_name  ... emissions  commercial_name_cleaned
0 FIAT 124 gt multiair auto ... 153 124
1 FIAT 500l wagon pop star t-jet ... 158 500
2 FIAT doblo combi 1.4 95 ... 165 None
3 FIAT panda 0.9t sge 85 natural power ... 86 panda
4 FIAT punto 1.4 77 lpg ... 114 punto
5 BMW AG x4 xdrive20d se auto ... 131 x4
6 BMW AG 216d active tourer b37 f45 ... 166 2-series
7 BMW AG 220d gran tourer b47 f46 ... 200 2-series
8 BMW AG x1 xdrive18d sport ... 151 x1
9 BMW AG 320i xdrive m sport gt auto ... 149 3-series

关于python - pandas 中的数据清理 : replacing null values with specific strings if these strings are contained in another column,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60400588/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com