gpt4 book ai didi

python - Pandas : Making Decision on groupby size()

转载 作者:太空宇宙 更新时间:2023-11-03 14:09:56 30 4
gpt4 key购买 nike

我正在尝试使用两个电子表格进行“更改数据捕获”。我对生成的数据框进行了分组,但遇到了一个奇怪的问题。要求:

案例 1)一个组的大小 == 2,做某些任务

情况 2)一个组的大小 == 1 ,做某些任务

Case 3) size_of_a_group > 2,做某些任务

问题是无论我如何尝试,我都无法根据其大小分解 groupby 的结果,然后循环遍历它

我想做这样的事情:

if(group_by_1.filter(lambda x : len(x) ==2):
for grp,rows in sub(??)group:
for j in range(len(rows)-1):
#check rows[j,'column1'] != rows[j+1,'column1']:
do something

这是我的代码片段。非常感谢任何帮助。

import pandas as pd
import numpy as np

pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
print("reading wolverine xlxs")


# defining metadata

df_header = ['DisplayName','StoreLanguage','Territory','WorkType','EntryType','TitleInternalAlias',
'TitleDisplayUnlimited','LocalizationType','LicenseType','LicenseRightsDescription',
'FormatProfile','Start','End','PriceType','PriceValue','SRP','Description',
'OtherTerms','OtherInstructions','ContentID','ProductID','EncodeID','AvailID',
'Metadata', 'AltID', 'SuppressionLiftDate','SpecialPreOrderFulfillDate','ReleaseYear','ReleaseHistoryOriginal','ReleaseHistoryPhysicalHV',
'ExceptionFlag','RatingSystem','RatingValue','RatingReason','RentalDuration','WatchDuration','CaptionIncluded','CaptionExemption','Any','ContractID',
'ServiceProvider','TotalRunTime','HoldbackLanguage','HoldbackExclusionLanguage']
df_w01 = pd.read_excel("wolverine_1.xlsx", names = df_header)

df_w02 = pd.read_excel("wolverine_2.xlsx", names = df_header)





df_w01['version'] = 'OLD'
df_w02['version'] = 'NEW'

#print(df_w01)
df_m_d = pd.concat([df_w01, df_w02], ignore_index = True).reset_index()

#print(df_m_d)

first_pass_get_duplicates = df_m_d[df_m_d.duplicated(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType',
'LicenseRightsDescription','FormatProfile','Start','End','PriceType','PriceValue','ContentID','ProductID',
'AltID','ReleaseHistoryPhysicalHV','RatingSystem','RatingValue','CaptionIncluded'], keep='first')] # This datframe has records which are DUPES on NEW and OLD
#print(first_pass_get_duplicates)

first_pass_drop_duplicate = df_m_d.drop_duplicates(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType',
'LicenseRightsDescription','FormatProfile','Start','End','PriceType','PriceValue','ContentID','ProductID',
'AltID','ReleaseHistoryPhysicalHV','RatingSystem','RatingValue','CaptionIncluded'], keep=False) # This datframe has records which are unique on desired values evn for first time

#print(first_pass_drop_duplicate)


group_by_1 = first_pass_drop_duplicate.groupby(['StoreLanguage','Territory','TitleInternalAlias','LocalizationType','LicenseType','FormatProfile'],as_index=False)
#Best Case group_by has 2 elements on big key and at least one row is 'new'
#print(group_by_1.grouper.group_info[0])
#for i,rows in group_by_1:

#if(.transform(lambda x : len(x)==2)):
#print(group_by_1.grouper.group_info[0])

#print(group_by_1.describe())

'''for i,rows in group_by_1:
temp_rows = rows.reset_index()
temp_rows.reindex(index=range(0,len(rows)))
print("group has: ", len(temp_rows))
for j in range(len(rows)-1):
print(j)
print("this iteration: ", temp_rows.loc[j,'Start'])
print("next iteration: ", temp_rows.loc[j+1,'Start'])
if(temp_rows.loc[j+1,'Start'] == temp_rows.loc[j,'Start']):
print("Match")
else:
print("no_match")
print(temp_rows.loc[j,'Start'])
print("++++-----++++")'''

非常感谢任何帮助。

最佳答案

groupbynp.sizedf 进行转换

考虑数据框 df

df = pd.DataFrame([
[1, 2, 3],
[1, 2, 3],
[2, 3, 4],
[2, 3, 4],
[2, 3, 4],
[3, 4, 5],
], columns=list('abc'))

和函数my_function

def my_function(df):
if df.name == 1:
return 'blue'
elif df.name == 2:
return 'red'
else:
return 'green'

分组依据是grouper

grouper = df.groupby('a').b.transform(np.size)
grouper

0 2
1 2
2 3
3 3
4 3
5 1
Name: b, dtype: int64

df.groupby(grouper).apply(my_function)

b
1 blue
2 red
3 green
dtype: object

你应该能够将这些拼凑起来以获得你想要的东西。

关于python - Pandas : Making Decision on groupby size(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39420183/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com