gpt4 book ai didi

Python/Pandas - ValueError : Index contains duplicate entries, 无法 reshape

转载 作者:行者123 更新时间:2023-11-28 21:40:14 31 4
gpt4 key购买 nike

我有一个名为“bal”的数据框。它看起来像这样:

              ano   id   unit period
business_id
9564 2012 302 sdasd anual
9564 2011 303 sdasd anual
2361 2013 304 sdasd anual
2361 2012 305 sdasd anual
...

我在上面运行以下代码:

bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)

我的意图是把它变成这样的东西:

               ano    id  unit    period  2006  2007  2008  2009  2010  \
business_id

72 2013 774 sdasd anual NaN NaN NaN NaN NaN

72 2012 775 sdasd anual NaN NaN NaN NaN NaN

74 2012 1120 sdasd anual NaN NaN NaN NaN NaN

119 2013 875 sdasd anual NaN NaN NaN NaN NaN

119 2012 876 sdasd anual NaN NaN NaN NaN NaN

...

当我编写该代码时,出现此错误:

ValueError: Index contains duplicate entries, cannot reshape

所以为了避免重复,我添加了一个 drop_duplicates 行:

bal=bal.drop_duplicates()
bal=bal.merge(bal.pivot(columns='ano', values='id'),right_index=True,left_index=True)

当我运行代码时,瞧,我遇到了同样的问题:

ValueError: Index contains duplicate entries, cannot reshape

我是做错了什么还是误解了什么?

编辑

bal 是我使用以下代码从 SQL 创建的数据框:

bal=pd.read_sql('select * from table;',connection).set_index('business_id')[['ano','id','unit','period']]

奇怪的是,如果我限制 SQL 查询,它工作正常:

bal=pd.read_sql('select * from table limit 1000;',connection).set_index('business_id')[['ano','id','unit','period']]

我认为问题可能与索引有很多重复项有关(如您在上面的示例中所见)。但是,如果我在这个有限的 bal 中 print(bal.head(4)) 它看起来与您在上面看到的完全一样,索引重复。

最佳答案

更新 2:

qry = "select distinct business_id,ano,id,unit,period from table where period='anual'"
bal=pd.read_sql(qry, connection, index_col=['business_id'])

假设我们得到以下 DF(ano 列中仍然有重复值):

In [167]: bal
Out[167]:
ano id unit period
business_id
9564 2012 302 sdasd anual
9564 2012 299 sdasd anual
9564 2011 303 sdasd anual
2361 2013 304 sdasd anual
2361 2012 305 sdasd anual

我们可以这样做:

In [169]: bal.join(bal.pivot_table(index=bal.index, columns='ano',
values='id', aggfunc='first'))
Out[169]:
ano id unit period 2011 2012 2013
business_id
2361 2013 304 sdasd anual NaN 305.0 304.0
2361 2012 305 sdasd anual NaN 305.0 304.0
9564 2012 302 sdasd anual 303.0 302.0 NaN
9564 2012 299 sdasd anual 303.0 302.0 NaN
9564 2011 303 sdasd anual 303.0 302.0 NaN

更新:

考虑以下示例 DF:

In [161]: bal
Out[161]:
ano id unit period
business_id
9564 2012 302 sdasd anual
9564 2012 299 sdasd anual # i've intentionally added this row with duplicated `ano`
9564 2011 303 sdasd anual
2361 2013 304 sdasd anual
2361 2012 305 sdasd anual

重现你的错误:

In [162]: bal.pivot(columns='ano', values='id')
...
skipped
...
ValueError: Index contains duplicate entries, cannot reshape

旧答案:

这是你想要的吗?

In [144]: bal.join(bal.pivot(columns='ano', values='id'))
Out[144]:
ano id unit period 2011 2012 2013
business_id
2361 2013 304 sdasd anual NaN 305.0 304.0
2361 2012 305 sdasd anual NaN 305.0 304.0
9564 2012 302 sdasd anual 303.0 302.0 NaN
9564 2011 303 sdasd anual 303.0 302.0 NaN

关于Python/Pandas - ValueError : Index contains duplicate entries, 无法 reshape ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45825381/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com