gpt4 book ai didi

python - 比 loc 更有效的清理数据帧的方法

转载 作者:行者123 更新时间:2023-11-28 20:14:22 25 4
gpt4 key购买 nike

我的代码如下:

import pandas as pd
df = pd.read_excel("Energy Indicators.xls", header=None, footer=None)
c_df = df.copy()
c_df = c_df.iloc[18:245, 2:]
c_df = c_df.rename(columns={2: 'Country', 3: 'Energy Supply', 4:'Energy Supply per Capita', 5:'% Renewable'})
c_df['Energy Supply'] = c_df['Energy Supply'].apply(lambda x: x*1000000)
c_df.loc[c_df['Country'] == 'Korea, Rep.'] = 'South Korea'
c_df.loc[c_df['Country'] == 'United States of America20'] = 'United States'
c_df.loc[c_df['Country'] == 'United Kingdom of Great Britain and Northern Ireland'] = 'United Kingdom'
c_df.loc[c_df['Country'] == 'China, Hong Kong Special Administrative Region'] = 'Hong Kong'
c_df.loc[c_df['Country'] == 'Venezuela (Bolivarian Republic of)'] = 'Venezuela'
c_df.loc[c_df['Country'] == 'Bolivia (Plurinational State of)'] = 'Bolivia'
c_df.loc[c_df['Country'] == 'Switzerland17'] = 'Switzerland'
c_df.loc[c_df['Country'] == 'Australia1'] = 'Australia'
c_df.loc[c_df['Country'] == 'China2'] = 'China'
c_df.loc[c_df['Country'] == 'Falkland Islands (Malvinas)'] = 'Bolivia'
c_df.loc[c_df['Country'] == 'Greenland7'] = 'Greenland'
c_df.loc[c_df['Country'] == 'Iran (Islamic Republic of'] = 'Iran'
c_df.loc[c_df['Country'] == 'Italy9'] = 'Italy'
c_df.loc[c_df['Country'] == 'Japan10'] = 'Japan'
c_df.loc[c_df['Country'] == 'Kuwait11'] = 'Kuwait'
c_df.loc[c_df['Country'] == 'Micronesia (Federal States of)'] = 'Micronesia'
c_df.loc[c_df['Country'] == 'Netherlands12'] = 'Netherlands'
c_df.loc[c_df['Country'] == 'Portugal13'] = 'Portugal'
c_df.loc[c_df['Country'] == 'Saudi Arabia14'] = 'Saudi Arabia'
c_df.loc[c_df['Country'] == 'Serbia15'] = 'Serbia'
c_df.loc[c_df['Country'] == 'Sint Maarteen (Dutch part)'] = 'Sint Marteen'
c_df.loc[c_df['Country'] == 'Spain16'] = 'Spain'
c_df.loc[c_df['Country'] == 'Ukraine18'] = 'Ukraine'
c_df.loc[c_df['Country'] == 'Denmark5'] = 'Denmark'
c_df.loc[c_df['Country'] == 'France6'] = 'France'
c_df.loc[c_df['Country'] == 'Indonesia8'] = 'Indonesia'

我觉得必须有一种更简单的方法来更改名称中带有括号和数字的国家/地区的值。我可以使用什么 pandas 方法在列中查找带有括号数量的名称? 是在?

最佳答案

您可以从去掉括号中的数字和文本开始。之后,对于需要重要替换的所有其他内容,声明一个映射并使用 pd.Series.replace 应用它。

mapper = {'Korea, Rep' : 'South Korea', 'Falkland Islands' : 'Bolivia', ...} 

df['Country'] = (
df['Country'].str.replace(r'\d+|\s*\(.*\)', '').str.strip().replace(mapper)
)

很简单,完成。

详情

\d+     # one or more digits
| # regex OR pipe
\s* # zero or more whitespace characters
\( # literal parentheses (opening brace)
.* # match anything
\) # closing brace

关于python - 比 loc 更有效的清理数据帧的方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50161569/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com