gpt4 book ai didi

python - pandas iterrows 抛出错误

转载 作者:太空宇宙 更新时间:2023-11-03 16:10:06 24 4
gpt4 key购买 nike

我正在尝试对两个数据帧进行更改数据捕获。逻辑是合并两个数据帧并按一个键分组,然后对计数 >1 的组运行循环以查看哪一列“更新”。我收到奇怪的错误。任何帮助表示赞赏。代码

import pandas as pd
import numpy as np

pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
print("reading wolverine xlxs")


# defining metadata

df_header = ['DisplayName','StoreLanguage','Territory','WorkType','EntryType','TitleInternalAlias',
'TitleDisplayUnlimited','LocalizationType','LicenseType','LicenseRightsDescription',
'FormatProfile','Start','End','PriceType','PriceValue','SRP','Description',
'OtherTerms','OtherInstructions','ContentID','ProductID','EncodeID','AvailID',
'Metadata', 'AltID', 'SuppressionLiftDate','SpecialPreOrderFulfillDate','ReleaseYear','ReleaseHistoryOriginal','ReleaseHistoryPhysicalHV',
'ExceptionFlag','RatingSystem','RatingValue','RatingReason','RentalDuration','WatchDuration','CaptionIncluded','CaptionExemption','Any','ContractID',
'ServiceProvider','TotalRunTime','HoldbackLanguage','HoldbackExclusionLanguage']
df_w01 = pd.read_excel("wolverine_1.xlsx", names = df_header)

df_w02 = pd.read_excel("wolverine_2.xlsx", names = df_header)

df_w01['version'] = 'OLD'
df_w02['version'] = 'NEW'

#print(df_w01)
df_m_d = pd.concat([df_w01, df_w02], ignore_index = True)

first_pass = df_m_d[df_m_d.duplicated(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType','FormatProfile'], keep=False)]

first_pass_keep_duplicate = df_m_d[df_m_d.duplicated(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType','FormatProfile'], keep='first')]

group_by_1 = first_pass.groupby(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType','FormatProfile'])
for i,rows in group_by_1.iterrows():
print("rownumber", i)
print (rows)


print(first_pass)

我得到的错误:

AttributeError: Cannot access callable attribute 'iterrows' of 'DataFrameGroupBy' objects, try using the 'apply' method

非常感谢任何帮助。

最佳答案

您的GroupBy对象支持迭代,因此而不是

for i,rows in group_by_1.iterrows():
print("rownumber", i)
print (rows)

你需要做类似的事情

for name, group in group_by_1:
print name
print group

然后您就可以对每个执行您需要执行的操作

参见the docs

关于python - pandas iterrows 抛出错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39400957/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com