gpt4 book ai didi

Python pandas - value_counts 无法正常工作

转载 作者:太空宇宙 更新时间:2023-11-03 17:11:01 24 4
gpt4 key购买 nike

基于this在堆栈上发布我尝试了像这样的值计数函数

df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))

除了以下事实之外,它工作得很好:尽管我的数据有 22 个独特的流派,并且在分割后我得到了 42 个值,这当然不是唯一的。数据示例:

     Action  Adventure   Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG     Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing Accounting  Action  Adventure   Animation & Modeling    Audio Production    Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing  nan
0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

(我只粘贴了头部和第一行)

我感觉问题是由我的原始数据引起的。好吧,我的专栏(流派)是包含括号的列表的列表

示例:[ Action 、独立]所以当Python读取它时,它会将[Action and Action and Action]读取为不同的值,并且输出是303个不同的值。所以我所做的是:

for i in df1['genres'].tolist():
if str(i) != 'nan':

i = i[1:-1]
new.append(i)
else:
new.append('nan')

最佳答案

您必须通过函数 str.strip 从列 genres 中删除第一个和最后一个 []然后通过函数 str.replace 将空格替换为空字符串

import pandas as pd

df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")


df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')

df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))

#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
print df
#remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
u'initial_price', u'is_free', u'metacritic', u'release_date',
u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
u'WebPublishing'],
dtype='object')

关于Python pandas - value_counts 无法正常工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34089108/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com