gpt4 book ai didi

python-3.x - 无法使用带有标记为索引的字符串的 loc 进行设置

转载 作者:行者123 更新时间:2023-12-05 04:04:49 29 4
gpt4 key购买 nike

我使用的是 pandas v23.4 和 python 3.7.0。我有两个具有相同形状和列的表。我想将一个表的索引选择子集设置为另一个表的相同索引选择子集。

它有时无法使用字符串索引,但我不确定它是否与它是一个字符串索引有关。注释掉数据框的一列未使用的列会使其正常工作。

下面的回溯显示它在索引中的某个地方,在确定目标和源是否具有相同长度时会感到困惑。代码很长,有点绕。

$ cat foo.py
import numpy as np
import pandas as pd

m = np.array([1., 2., 1., 3., 5., 5., 6., 2., 2., 1., 7., 2.,
5., 4., 2., 5., 5., 5., 3., 8., 7., 2., 7., 6.], )
dma_l = [501, 501, 501, 501, 501, 501, 501, 501, 501, 501, 501, 501,
502, 502, 502, 502, 502, 502, 502, 502, 502, 502, 502, 502]
size_l = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,
1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]

age_l = ['20-25', '30-35', '40-45',
'20-25', '30-35', '40-45',
'20-25', '30-35', '40-45',
'20-25', '30-35', '40-45',
'20-25', '30-35', '40-45',
'20-25', '30-35', '40-45',
'20-25', '30-35', '40-45',
'20-25', '30-35', '40-45']

df = pd.DataFrame()
df['dma'] = dma_l # <-- comment out this line and it works
df['size'] = size_l
df['age'] = age_l
df['total'] = m

df2 = df.copy() # Make a second dataframe with the same shape.

# Works with an integer index.
df.set_index('size', inplace=True)
df2.set_index('size', inplace=True)
df.loc[(1,), 'total'] = df2.loc[(1,), 'total']

# Does not work with my string index. Removing the dma column
# causes it to work again.
df.set_index('age', inplace=True)
df2.set_index('age', inplace=True)
df.loc[('20-25',), 'total'] = df2.loc[('20-25',), 'total']

$ python foo.py
Traceback (most recent call last):
File "...", line 34, in <module>
df.loc[('20-25',), 'total'] = df2.loc[('20-25',), 'total']
File ".../lib/python3.7/site-packages/pandas/core/indexing.py", line 189, in __setitem__
self._setitem_with_indexer(indexer, value)
File ".../lib/python3.7/site-packages/pandas/core/indexing.py", line 606, in _setitem_with_indexer
raise ValueError('Must have equal len keys and value '
ValueError: Must have equal len keys and value when setting with an iterable

最佳答案

如果两个 DataFrame 中的行数和索引值相同,则所有解决方案都有效。


问题是重复的索引值,如果想使用一个索引值,解决方案是创建 bool 掩码:

df2['total'] *= 10
df.loc[df.index == 1, 'total'] = df2.loc[1, 'total']
print (df)
dma age total
size
1 501 20-25 10.0
1 501 30-35 20.0
1 501 40-45 10.0
2 501 20-25 3.0
2 501 30-35 5.0
2 501 40-45 5.0
3 501 20-25 6.0
3 501 30-35 2.0
3 501 40-45 2.0
4 501 20-25 1.0
4 501 30-35 7.0
4 501 40-45 2.0
1 502 20-25 50.0
1 502 30-35 40.0
1 502 40-45 20.0
2 502 20-25 5.0
2 502 30-35 5.0
2 502 40-45 5.0
3 502 20-25 3.0
3 502 30-35 8.0
3 502 40-45 7.0
4 502 20-25 2.0
4 502 30-35 7.0
4 502 40-45 6.0

或者通过分配创建新列以获得更通用的解决方案:

df2['total'] *= 10
df['total1'] = df2['total']
#working with one DataFrame
df.loc[[1, 4], 'total'] = df.loc[[1, 4], 'total1']
print (df)
dma age total total1
size
1 501 20-25 10.0 10.0
1 501 30-35 20.0 20.0
1 501 40-45 10.0 10.0
2 501 20-25 3.0 30.0
2 501 30-35 5.0 50.0
2 501 40-45 5.0 50.0
3 501 20-25 6.0 60.0
3 501 30-35 2.0 20.0
3 501 40-45 2.0 20.0
4 501 20-25 10.0 10.0
4 501 30-35 70.0 70.0
4 501 40-45 20.0 20.0
1 502 20-25 50.0 50.0
1 502 30-35 40.0 40.0
1 502 40-45 20.0 20.0
2 502 20-25 5.0 50.0
2 502 30-35 5.0 50.0
2 502 40-45 5.0 50.0
3 502 20-25 3.0 30.0
3 502 30-35 8.0 80.0
3 502 40-45 7.0 70.0
4 502 20-25 20.0 20.0
4 502 30-35 70.0 70.0
4 502 40-45 60.0 60.0

另一种解决方案是在两个 DataFrame 中创建掩码和过滤器:

df2['total'] *= 10

mask = df.index.isin([1,4])
df.loc[mask, 'total'] = df2.loc[mask, 'total']
print (df)
dma age total
size
1 501 20-25 10.0
1 501 30-35 20.0
1 501 40-45 10.0
2 501 20-25 3.0
2 501 30-35 5.0
2 501 40-45 5.0
3 501 20-25 6.0
3 501 30-35 2.0
3 501 40-45 2.0
4 501 20-25 10.0
4 501 30-35 70.0
4 501 40-45 20.0
1 502 20-25 50.0
1 502 30-35 40.0
1 502 40-45 20.0
2 502 20-25 5.0
2 502 30-35 5.0
2 502 40-45 5.0
3 502 20-25 3.0
3 502 30-35 8.0
3 502 40-45 7.0
4 502 20-25 20.0
4 502 30-35 70.0
4 502 40-45 60.0

关于python-3.x - 无法使用带有标记为索引的字符串的 loc 进行设置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51979146/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com