gpt4 book ai didi

python - 将 pandas hub_table 与 Interval 列一起使用会导致 TypeError

转载 作者:行者123 更新时间:2023-11-28 21:32:22 24 4
gpt4 key购买 nike

      cat1             cat2                       col_a             col_b
0 (34.0, 38.0] (15.9, 47.0] 29 10
1 (34.0, 38.0] (15.9, 47.0] 37 22
2 (28.0, 34.0] (47.0, 56.0] 3 13
3 (34.0, 38.0] (47.0, 56.0] 15 7
4 (28.0, 34.0] (56.0, 67.0] 42 20
5 (28.0, 34.0] (47.0, 56.0] 31 23
6 (28.0, 34.0] (56.0, 67.0] 26 17
7 (28.0, 34.0] (56.0, 67.0] 7 1
8 (28.0, 34.0] (56.0, 67.0] 36 19
9 (19.0, 28.0] (56.0, 67.0] 5 7
10 (19.0, 28.0] (56.0, 67.0] 21 5
11 (28.0, 34.0] (67.0, 84.0] 37 13

在上面的数据框中,我想使用 pandas 进行数据透视表操作

pd.pivot_table(df, index='cat1', columns='cat2', values='col_a')

但我收到错误:

TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

col_acol_b 都是 int32 类型,cat1cat2 都是 categorical 类型。如何消除这个错误?

最佳答案

这是与以间隔为单位的旋转列相关的错误(请参阅 GH25814 ),并将在 v0.25 中修复。另请使用crosstab查看此相关问题:Pandas crosstab on CategoricalDType columns throws TypeError

同时,这里有一些选项。要聚合,您必须使用 pivot_table 并在旋转之前将分类列转换为字符串。

df2 = df.assign(cat1=df['cat1'].astype(str), cat2=df['cat2'].astype(str))
# to aggregate by taking the mean of col_a
df2.pivot_table(index='cat1', columns='cat2', values='col_a', aggfunc='mean')

这里需要注意的是,您将失去索引和列作为间隔的好处。

另一种选择是基于分类代码,然后重新分配类别:

df2 = df.assign(cat1=df['cat1'].cat.codes, cat2=df['cat2'].cat.codes)
pivot = df2.pivot_table(
index='cat1', columns='cat2', values='col_a', aggfunc='mean')

pivot.index = df['cat1'].cat.categories
pivot.columns = df['cat2'].cat.categories

此分配将起作用,因为 pivot_table 预先对间隔进行排序。


最小代码示例

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame({
'cat1': np.random.choice(100, 10),
'cat2': np.random.choice(100, 10),
'col_a': np.random.randint(1, 50, 10)})

df['cat1'] = pd.cut(df['cat1'], bins=np.arange(0, 101, 10))
df['cat2'] = pd.cut(df['cat2'], bins=np.arange(0, 101, 10))

df
A B C
0 (40, 50] (60, 70] 18
1 (40, 50] (80, 90] 38
2 (60, 70] (80, 90] 26
3 (60, 70] (10, 20] 14
4 (60, 70] (50, 60] 9
5 (0, 10] (60, 70] 10
6 (80, 90] (30, 40] 21
7 (20, 30] (80, 90] 17
8 (30, 40] (40, 50] 6
9 (80, 90] (80, 90] 16

(df.assign(cat1=df['cat1'].astype(str), cat2=df['cat2'].astype(str))
.pivot_table(index='cat1', columns='cat2', values='col_a', aggfunc='mean'))

cat2 (10, 20] (30, 40] (40, 50] (50, 60] (60, 70] (80, 90]
cat1
(0, 10] NaN NaN NaN NaN 10.0 NaN
(20, 30] NaN NaN NaN NaN NaN 17.0
(30, 40] NaN NaN 6.0 NaN NaN NaN
(40, 50] NaN NaN NaN NaN 18.0 38.0
(60, 70] 14.0 NaN NaN 9.0 NaN 26.0
(80, 90] NaN 21.0 NaN NaN NaN 16.0

关于python - 将 pandas hub_table 与 Interval 列一起使用会导致 TypeError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56667958/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com