gpt4 book ai didi

python - pandas.replace 与 str.replace 正则表达式冲突。代码顺序

转载 作者:行者123 更新时间:2023-11-30 22:32:40 25 4
gpt4 key购买 nike

我的任务是删除括号中的所有内容并删除国家/地区名称后面的所有数字。更改几个国家的名称。

例如玻利维亚(多民族国)”应为“玻利维亚”Switzerland17'应该是'瑞士'`。

我的原始代码的顺序是:

dict1 = {
"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"}

energy['Country'] = energy['Country'].replace(dict1)
energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
energy.loc[energy['Country'] == 'United States']

str.replace 部分工作正常。任务已经完成。当我使用最后一行检查我是否成功更改国家/地区名称时。这个原始代码不起作用。但是,如果我将代码的顺序更改为:

energy['Country'] = energy['Country'].str.replace(r'\(.*\)', '')
能源['国家'] = 能源['国家'].str.replace('\d+', '')
能源['国家'] = 能源['国家'].replace(dict1)

然后成功更改国家/地区名称。那么我的正则表达式语法肯定有问题,如何解决这个冲突?为什么会发生这种情况?

最佳答案

问题是您需要regex=True replace用于替换子字符串:

energy = pd.DataFrame({'Country':['United States of America4',
'United States of America (aaa)','Slovakia']})
print (energy)
Country
0 United States of America4
1 United States of America (aaa)
2 Slovakia

dict1 = {
"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"}
<小时/>
#no replace beacuse no match (numbers and ()) 
energy['Country'] = energy['Country'].replace(dict1)
print (energy)
Country
0 United States of America4
1 United States of America (aaa)
2 Slovakia

energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
print (energy)
Country
0 United States of America
1 United States of America
2 Slovakia

print (energy.loc[energy['Country'] == 'United States'])
Empty DataFrame
Columns: [Country]
Index: []
<小时/>
energy['Country'] = energy['Country'].replace(dict1, regex=True)
print (energy)
Country
0 United States4
1 United States (aaa)
2 Slovakia

energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
print (energy)
Country
0 United States
1 United States
2 Slovakia

print (energy.loc[energy['Country'] == 'United States'])
Country
0 United States
1 United States
<小时/>
#first data cleaning
energy['Country'] = energy['Country'].str.replace(r' \(.*\)', '')
energy['Country'] = energy['Country'].str.replace('\d+', '')
print (energy)
Country
0 United States of America
1 United States of America
2 Slovakia

#replace works nice
energy['Country'] = energy['Country'].replace(dict1)
print (energy)
Country
0 United States
1 United States
2 Slovakia

print (energy.loc[energy['Country'] == 'United States'])
Country
0 United States
1 United States

关于python - pandas.replace 与 str.replace 正则表达式冲突。代码顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45397416/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com