gpt4 book ai didi

python - 在具有 if 条件的整个数据帧上使用 pandas 中的 applymap

转载 作者:行者123 更新时间:2023-11-28 16:56:42 27 4
gpt4 key购买 nike

我有一个 pandas 数据框,我正在使用自定义函数在所有元素上使用 applymap 清理数据,并将清理后的值存储在单独的列中。

tag0           tag1                         tag2            tag3
1.Kharif 3.Pest and Disease Management 4.Grasshopper 2.Paddy
1.Kharif 2.Brinjal 3.Crop Growth Management
1.Kharif 3.Pest and Disease Management 4.Caterpillar 2.Black Gram
1.Kharif 3.Pest and Disease Management 4.Caterpillar 2.Cotton

以上是整个数据框的一部分。

我写了下面写的函数。

def tag_cleaner(tag):
'''
this function takes an argument called tag and checks if it starts with 1 then
it puts it in a new column called season and so on. It is performed row-wise
and at the end the dataframe will have columnar values
'''
if tag.startswith('1'):
df_tags['season'] = tag
elif tag.startswith('2'):
df_tags['crop'] = tag
elif tag.startswith('3'):
df_tags['maintopic'] = tag
elif tag.startswith('4'):
df_tags['subtopic'] = tag
elif tag.startswith('5'):
df_tags['issue'] = tag
else:
return tag

然后应用applymap函数

df_tags.applymap(tag_cleaner)

我希望输出是这样的

season          crop            maintopic                      subtopic
1. Kharif 2.Paddy 3. Pest and Disease Management 4. Grasshopper
1. Kharif 2. Brinjal 3. Crop Growth Management NA
1. Kharif 2. Black Gram 3. Pest and Disease Management 4. Catterpillar
1. Kharif 2. Cotton 3. Pest and Disease Management 4. Catterpillar

该命令能够根据需要创建新列,但在所有列中都有相同的值。看起来像这样。这是在整个数据框中复制的相同值。

season    crop    maintopic                 subtopic
1.Kharif 2.Paddy 3.Crop Growth Management 4. Caterpillar

但是我收到了这个错误

AttributeError: ("'float' object has no attribute 'startswith'", 'occurred at index tag2')

我是初学者,不知道哪里错了。我想我在我定义的函数中犯了一个逻辑错误,这就是为什么函数的最后一次运行将值复制到整个数据帧系列。请帮忙。

最佳答案

使用:

#reshape DataFrame with remove original columns names
df = df.stack().to_frame('a').reset_index(level=1, drop=True).reset_index()
#get values before .
df['b'] = df['a'].str.split('.').str[0]
#dictionary for new columns names
d = {'1': 'season', '2': 'crop', '3': 'maintopic', '4':'subtopic','5':'issue'}
#pivoting and get new columns names
df = df.pivot('index','b','a').rename(columns=d).rename_axis(None, axis=1).rename_axis(None)

print (df)
season crop maintopic subtopic
0 1.Kharif 2.Paddy 3.Pest and Disease Management 4.Grasshopper
1 1.Kharif 2.Brinjal 3.Crop Growth Management NaN
2 1.Kharif 2.Black Gram 3.Pest and Disease Management 4.Caterpillar
3 1.Kharif 2.Cotton 3.Pest and Disease Management 4.Caterpillar

编辑:错误意味着这里有多个值每行具有相同的数字,解决方案是使用 pivot_table具有聚合函数 join:

print (df)
tag0 tag1 tag2 \
0 1.Kharif 1.Pest and Disease Management 4.Grasshopper
1 1.Kharif 2.Brinjal 3.Crop Growth Management
2 1.Kharif 3.Pest and Disease Management 4.Caterpillar
3 1.Kharif 3.Pest and Disease Management 4.Caterpillar

tag3
0 2.Paddy
1 NaN
2 2.Black Gram
3 2.Cotton

df = df.stack().to_frame('a').reset_index(level=1, drop=True).reset_index()
df['b'] = df['a'].str.split('.').str[0]
d = {'1': 'season', '2': 'crop', '3': 'maintopic', '4':'subtopic','5':'issue'}

df = df.pivot_table(index='index',columns='b',values='a', aggfunc=','.join).rename(columns=d).rename_axis(None, axis=1).rename_axis(None)

print (df)
season crop \
0 1.Kharif,1.Pest and Disease Management 2.Paddy
1 1.Kharif 2.Brinjal
2 1.Kharif 2.Black Gram
3 1.Kharif 2.Cotton

maintopic subtopic
0 NaN 4.Grasshopper
1 3.Crop Growth Management NaN
2 3.Pest and Disease Management 4.Caterpillar
3 3.Pest and Disease Management 4.Caterpillar

关于python - 在具有 if 条件的整个数据帧上使用 pandas 中的 applymap,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57720835/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com