gpt4 book ai didi

python - 操作从 dict(tuple-float) 创建的 DataFrame

转载 作者:行者123 更新时间:2023-12-01 03:53:56 25 4
gpt4 key购买 nike

我一直在尝试从具有以下结构的字典创建一个 DataFrame。

imgAlbmShrDict = {('10', 'photo_album_57'): 20.0,
('10', 'photo_album_8'): 20.0,
('1061', 'photo_album_29'): 100.0,
('1061', 'photo_album_90'): 90.0,
('1102', 'photo_album_29'): 80.0,
('1102', 'photo_album_90'): 60.0,
('1300', 'photo_album_15'): 100.0,
('1300', 'photo_album_89'): 60.0,
('1301', 'photo_album_15'): 88.88888888888889,
('1301', 'photo_album_89'): 60.0
}

pd.DataFrame(imgAlbmShrDict,index=['比例']).transpose()

                    Proportion
10 photo_album_57 20.000000
photo_album_8 20.000000
1061 photo_album_29 100.000000
photo_album_90 90.000000
1102 photo_album_29 80.000000
photo_album_90 60.000000
1300 photo_album_15 100.000000
photo_album_89 60.000000
1301 photo_album_15 88.888889
photo_album_89 60.000000

输出正是我所需要的,但我无法仅从数据框中提取前两列。第一列实际上是图像 ID,第二列是该图像出现的相册。

我需要访问列的帮助以及在保留结构的同时添加列的方法。

所需输出:

                     Proportion  URL
10 photo_album_57 20.000000 www.something.com/10.jpeg
photo_album_8 20.000000
1061 photo_album_29 100.000000 www.something.com/1061.jpeg
photo_album_90 90.000000
1102 photo_album_29 80.000000 www.something.com/1102.jpeg
photo_album_90 60.000000
1300 photo_album_15 100.000000 www.something.com/1300.jpeg
photo_album_89 60.000000
1301 photo_album_15 88.888889 www.something.com/1301.jpeg
photo_album_89 60.000000

最佳答案

您可以使用get_level_values ,因为前两列是 Multiindex :

print (df.index.get_level_values(0))
Index(['10', '10', '1061', '1061', '1102', '1102', '1300', '1300', '1301',
'1301'],
dtype='object')

df['URL'] = 'www.something.com/' + df.index.get_level_values(0) + '.jpg'
print (df)
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpg
photo_album_8 20.000000 www.something.com/10.jpg
1061 photo_album_29 100.000000 www.something.com/1061.jpg
photo_album_90 90.000000 www.something.com/1061.jpg
1102 photo_album_29 80.000000 www.something.com/1102.jpg
photo_album_90 60.000000 www.something.com/1102.jpg
1300 photo_album_15 100.000000 www.something.com/1300.jpg
photo_album_89 60.000000 www.something.com/1300.jpg
1301 photo_album_15 88.888889 www.something.com/1301.jpg
photo_album_89 60.000000 www.something.com/1301.jpg

也许需要drop_duplicates :

df = df.drop_duplicates(subset='URL')
print (df)
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpg
1061 photo_album_29 100.000000 www.something.com/1061.jpg
1102 photo_album_29 80.000000 www.something.com/1102.jpg
1300 photo_album_15 100.000000 www.something.com/1300.jpg
1301 photo_album_15 88.888889 www.something.com/1301.jpg

另一个解决方案 reset_index并设置列名称:

df.reset_index(inplace=True)
df.columns = ['ID','Album','Proportion']
df['URL'] = 'www.something.com/' + df['ID'] + '.jpg'
print (df)
ID Album Proportion URL
0 10 photo_album_57 20.000000 www.something.com/10.jpg
1 10 photo_album_8 20.000000 www.something.com/10.jpg
2 1061 photo_album_29 100.000000 www.something.com/1061.jpg
3 1061 photo_album_90 90.000000 www.something.com/1061.jpg
4 1102 photo_album_29 80.000000 www.something.com/1102.jpg
5 1102 photo_album_90 60.000000 www.something.com/1102.jpg
6 1300 photo_album_15 100.000000 www.something.com/1300.jpg
7 1300 photo_album_89 60.000000 www.something.com/1300.jpg
8 1301 photo_album_15 88.888889 www.something.com/1301.jpg
9 1301 photo_album_89 60.000000 www.something.com/1301.jpg

编辑1:

谢谢stephen以获得解决方案。

我尝试通过 boolean indexing 让它变得更好与 Index.duplicated :

mask = ~df.index.get_level_values(0).duplicated()
print (mask)
[ True False True False True False True False True False]

subindex = df.index[mask]

df.loc[subindex, 'URL'] = 'www.something.com/' + subindex.get_level_values(0) + '.jpg'
df.URL.fillna('', inplace=True)
print (df)
Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpg
photo_album_8 20.000000
1061 photo_album_29 100.000000 www.something.com/1061.jpg
photo_album_90 90.000000
1102 photo_album_29 80.000000 www.something.com/1102.jpg
photo_album_90 60.000000
1300 photo_album_15 100.000000 www.something.com/1300.jpg
photo_album_89 60.000000
1301 photo_album_15 88.888889 www.something.com/1301.jpg
photo_album_89 60.000000

关于python - 操作从 dict(tuple-float) 创建的 DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37826045/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com