gpt4 book ai didi

python - Pandas - 只能将大小为 1 的数组转换为 Python 标量

转载 作者:行者123 更新时间:2023-12-04 01:27:07 25 4
gpt4 key购买 nike

我有两个数据框:

df_melt:

    MatchID GameWeek        Date                      Team  Home               AgainstTeam
0 46605 1 2019-08-09 Liverpool Home Norwich City
1 46605 1 2019-08-09 Norwich City Away Liverpool
2 46606 1 2019-08-10 AFC Bournemouth Home Sheffield United
3 46606 1 2019-08-10 Sheffield United Away AFC Bournemouth
4 46607 1 2019-08-10 Burnley Home Southampton
.. ... ... ... ... ... ...
533 46871 27 2020-02-23 Watford Away Manchester United
534 46872 27 2020-02-22 Sheffield United Home Brighton and Hove Albion
535 46872 27 2020-02-22 Brighton and Hove Albion Away Sheffield United
536 46873 27 2020-02-22 Southampton Home Aston Villa
537 46873 27 2020-02-22 Aston Villa Away Southampton

并且,对于玩家匹配,df_pm:

                                       Player  GameWeek  Minutes  ... CloseShotCreated TotalShotCreated  HeadersCreated
PlayerMatchesDetailID ...
1 Alisson 1 90 ... 0 0 0
2 Virgil van Dijk 1 90 ... 0 0 0
3 Joseph Gomez 1 90 ... 0 1 0
4 Andrew Robertson 1 90 ... 0 1 0
5 Trent Alexander-Arnold 1 90 ... 3 3 1
... ... ... ... ... ... ... ...
15053 Matty James 22 0 ... 0 0 0
15054 Matty James 23 0 ... 0 0 0
15055 Matty James 24 0 ... 0 0 0
15056 Matty James 25 0 ... 0 0 0
15057 Matty James 26 0 ... 0 0 0

现在,我尝试遍历 df_pm 并根据 df_melt 的某些条件查找项目,如下所示:

#Instantiate an empty list
match_ids = []
home_away = []
dates = []

#For each row in the player matches dataframe...
for row in df_pm.itertuples():
#Look up the match id from the team matches dataframe
team = row.ForTeam
againstteam = row.AgainstTeam
gameweek = row.GameWeek

match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
&(df_melt['Team']==team)
&(df_melt['AgainstTeam']==againstteam),
'MatchID'].item()

date = df_melt.loc[(df_melt['GameWeek']==gameweek)
&(df_melt['Team']==team)
&(df_melt['AgainstTeam']==againstteam),
'Date'].item()

home = df_melt.loc[(df_melt['GameWeek']==gameweek)
&(df_melt['Team']==team)
&(df_melt['AgainstTeam']==againstteam),
'Home'].item()

#Add it to the list
match_ids.append(match_id)
home_away.append(home)
dates.append(date)

但是对于所有迭代,即使我打印“team”、againstteam”和“gameweek”,我也会收到以下错误:

Traceback (most recent call last):
File "tableau_data_generation.py", line 155, in <module>
'MatchID'].item()
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

...表明该项目不存在。

但是当我打印完整的数据帧 df_melt 时,就像这样:

with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
print(df_melt, df_melt.shape)

我得到 (538, 6) 并且可以看到所有数据都在那里,没有任何缺陷。


当我检查类型时,我看到:

df_melt:

MatchID        object
GameWeek object
Date object
Team object
Home object
AgainstTeam object

df_pm:

Player                 object
GameWeek int64
Minutes int64
ForTeam object
AgainstTeam object
Goals int64
ShotsOnTarget int64
ShotsInBox int64
CloseShots int64
TotalShots int64
Headers int64
GoalAssists int64
ShotOnTargetCreated int64
ShotInBoxCreated int64
CloseShotCreated int64
TotalShotCreated int64
HeadersCreated int64

所以这里存在类型不匹配。


如果我在执行迭代之前添加以下代码行:

df_melt['GameWeek'] = pd.to_numeric(df_melt['GameWeek'])

我在 df_pm.itertuples() 的第一行成功地打印了几十个“match_id”、“date”和“home”(在我添加该行之前没有打印),只是在第二行再次中断并出现相同的错误:

ValueError: can only convert an array of size 1 to a Python scalar

我该如何解决这个问题?


注意:这是上面代码之后的内容。

def matchid_lookup(player, date, team, gameweek):
try:
try:
return df_pm.loc[(df_pm['Date']==date)
&(df_pm['Player']==player), 'MatchID'].item()
except:
return df_pm.loc[(df_pm['Date']==date)
&(df_pm['ForTeam']==team), 'MatchID'].iloc[0]
except:
return df_pm.loc[(df_pm['GameWeek']==gameweek)
&(df_pm['Player']==player), 'MatchID'].item()

#Declare the list as a column in the player matches df
df_pm['MatchID']=match_ids
df_pm['Date']=pd.to_datetime(dates)
df_pm['Home']=home_away
df_pm['Position']=df_pm['Player'].map(pos_lookup)

#Get the match IDs column first in the dataframe
cols = list(df_pm.columns)
new_cols = ['MatchID', 'Date', 'Home','Position'] + cols[:-4]
df_pm = df_pm[new_cols]

#Bring in stats from api table
#First, get key identifiers into the api table to facilitate joining
df_api_stats['Player'] = df_api_stats['PlayerID'].map(player_lookup)
df_api_stats['Team'] = df_api_stats['PlayerID'].map(team_lookup)
df_api_stats['MatchID'] = df_api_stats.apply(lambda x: matchid_lookup(x['Player'],
x['Date'],
x['Team'],
x['GameWeek']), axis=1)
api_cols = ['Player', 'MatchID', 'BPS', 'MinutesPlayed',
'CleanSheet', 'Saves', 'NetTransfersIn',
'SelectedBy', 'Points', 'Price']

df_api_cols = df_api_stats[api_cols]

最佳答案

因此 df_api_stats 中有一些 Date 不在 df_pm 中,您可以通过以下方式查看:

print (set(pd.to_datetime(df_api_stats['Date'])) - set(pd.to_datetime(df_pm['Date'])))
{Timestamp('2020-01-29 00:00:00'),
Timestamp('2020-02-28 00:00:00'),
Timestamp('2020-02-29 00:00:00'),
Timestamp('2020-03-01 00:00:00'),
Timestamp('2020-03-07 00:00:00'),
Timestamp('2020-03-08 00:00:00'),
Timestamp('2020-03-09 00:00:00')}

我不确定您想如何处理缺失值,但为了避免方法失败,您可以添加一个 except 并在所有可能性都不匹配时返回 nan。

def matchid_lookup(player, date, team, gameweek):
try:
try:
return df_pm.loc[(df_pm['Date']==date)
&(df_pm['Player']==player), 'MatchID'].item()
except:
return df_pm.loc[(df_pm['Date']==date)
&(df_pm['ForTeam']==team), 'MatchID'].iloc[0]
except:
try:
return df_pm.loc[(df_pm['GameWeek']==gameweek)
&(df_pm['Player']==player), 'MatchID'].item()
except:
return np.nan

注意:就在之前导致问题的 for 循环之前,不要忘记执行此操作:

df_melt['GameWeek'] = pd.to_numeric(df_melt['GameWeek'])
df_melt[['Team', 'AgainstTeam']] = df_melt[['Team', 'AgainstTeam']]\
.replace('AFC Bournemouth', 'Bournemouth')

关于python - Pandas - 只能将大小为 1 的数组转换为 Python 标量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61705173/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com