gpt4 book ai didi

python - AttributeError : Can only use . 带有字符串值的 str 访问器,它在 pandas (Python) 中使用 np.object_ dtype

转载 作者:太空宇宙 更新时间:2023-11-03 11:13:07 28 4
gpt4 key购买 nike

我正在操作一个 JSON 文件,我从中运行此代码以获取以下数据帧:

import pandas as pd

topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)')
total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)')

data = []
for username in df.username.unique():
for topic in zip(topics[0][username], total[0][username]):
data.append([username, topic])

df_topic = pd.DataFrame(data, columns='username,topic'.split(','))

username topic
0 lukl (Hardware", 80)
1 lukl (Marketplace", 31)
2 lukl (Atari 5200", 27)
3 lukl (Atari 8-Bit Computers", 9)
4 lukl (Modern Gaming", 3)

现在我需要将“主题”列中的信息分成两个不同的列:

这是预期的结果:

    username        topic          _topic       _total
0 lukl (Hardware", 80) Hardware 80
1 lukl (Marketplace", 31) Marketplace 31
2 lukl (Atari 5200", 27) Atari 5200 27
3 lukl (Atari 8", 9) Atari 8 9
4 lukl (Modern", 3) Modern 3

我想用这段代码来做:

df_top = df_topic.copy()
df_top['_topic'] = df_topic['topic'].str.split('(').str[1].str.split('",').str[0]
df_top['_total'] = df_topic['topic'].str.split('",').str[1].str.split(')').str[0]
df_top

但是我收到了这个错误:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

最佳答案

我认为有元组,所以只使用 DataFrame 构造函数:

df_topic[['_topic', '_total']]=pd.DataFrame(df_topic['topic'].values.tolist(), 
index=df_topic.index)

更好的解决方案是使用 concat 之前的答案数据和 DataFrame.reset_index :

df = [{"username": "last",
"popular_board_data": "{\"boards\":[{\"postCount\":\"75\",\"topicCount\":\"5\",\"name\":\"Hardware\",\"url\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount\":\"26\",\"topicCount\":\"1\",\"name\":\"Atari 5200\",\"url\",\"totalCount\":27},{\"postCount\":\"9\",\"topicCount\":0,\"name\":\"Atari 8\",\"url\"\"totalCount\":9}"
},
{"username": "truk",
"popular_board_data": "{\"boards\":[{\"postCount\":\"351\",\"topicCount\":\"11\",\"name\":\"Atari 2600\",\"url\",\"totalCount\":362},{\"postCount\":\"333\",\"topicCount\":\"22\",\"name\":\"Hardware\",\"url\",\"totalCount\":355},{\"postCount\":\"194\",\"topicCount\":\"8\",\"name\":\"Marketplace\",\"url\",\"totalCount\":202}"
}]
df = pd.DataFrame(df)

#added " for remove it from output
topics = df.set_index('username').popular_board_data.str.extractall(r'name":"([^,]*)"')
total = df.set_index('username').popular_board_data.str.extractall(r'totalCount\":([^,}]*)')

df1 = pd.concat([topics[0], total[0]], axis=1, keys=['_topic', '_total'])
df1 = df1.reset_index(level=1, drop=True).reset_index()
print (df1)
username _topic _total
0 last Hardware 80
1 last Marketplace 31
2 last Atari 5200 27
3 last Atari 8 9
4 truk Atari 2600 362
5 truk Hardware 355
6 truk Marketplace 202

关于python - AttributeError : Can only use . 带有字符串值的 str 访问器,它在 pandas (Python) 中使用 np.object_ dtype,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57018535/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com