gpt4 book ai didi

python - pandas 与复杂类型列不兼容的形状

转载 作者:行者123 更新时间:2023-11-30 21:51:34 26 4
gpt4 key购买 nike

如何将复杂类型(即 numpy 数组)作为列添加到 pandas 数据框?

df = pd.DataFrame({'foo':['bar', 'baz'], 'bar':[1,2]})
display(df)

my_array = np.array([[[0.61209572, 0.616934 , 0.94374808, 0.6818203 ],
[0.4236548 , 0.64589411, 0.43758721, 0.891773 ]],

[[0.52184832, 0.41466194, 0.26455561, 0.77423369],
[0.5488135 , 0.71518937, 0.60276338, 0.54488318]]])

print(my_array)
print(df.shape)
print(my_array.shape)

df['complex_type'] = my_array

失败:

AssertionError: Shape of new values must be compatible with manager shape

我的 pandas 版本是:1.0.0

编辑

一个更复杂的例子:

#%%timeit
import numpy as np
import pandas as pd
from scipy.spatial import cKDTree

rng = np.random.RandomState(0)
n_points = 50
d_dimensions = 4
k_neighbours = 3

X = rng.random_sample((n_points, d_dimensions))

df = pd.DataFrame(X)
df = df.reset_index(drop=False)
df.columns = ['id_str', 'lat_1', 'long_1', 'lat_2', 'long_2']
df.id_str = df.id_str.astype(object)

tree = cKDTree(df[['lat_1', 'long_1', 'lat_2', 'long_2']])
dist,ind=tree.query(X, k=k_neighbours,n_jobs=-1)


df = df.join(pd.DataFrame({'complex_type' : [arr for arr in X[ind]]}))
#df['complex_type'] = list(X[ind])
df.head()

最佳答案

In [29]: df = pd.DataFrame({'foo':['bar', 'baz'], 'bar':[1,2]}) 
...: display(df)
...:
...: my_array = np.array([[[0.61209572, 0.616934 , 0.94374808, 0.6818203 ],
...: [0.4236548 , 0.64589411, 0.43758721, 0.891773 ]],
...:
...: [[0.52184832, 0.41466194, 0.26455561, 0.77423369],
...: [0.5488135 , 0.71518937, 0.60276338, 0.54488318]]])
...:
foo bar
0 bar 1
1 baz 2
In [30]: my_array.shape
Out[30]: (2, 2, 4)

分配两个 (2,4) 数组的列表有效:

In [31]: df['new'] = list(my_array)                                                            
In [32]: df
Out[32]:
foo bar new
0 bar 1 [[0.61209572, 0.616934, 0.94374808, 0.6818203]...
1 baz 2 [[0.52184832, 0.41466194, 0.26455561, 0.774233...

In [33]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
foo 2 non-null object
bar 2 non-null int64
new 2 non-null object
dtypes: int64(1), object(2)
memory usage: 176.0+ bytes

但请注意,您不会从 pandas 返回 (2,2,4) 数组;你得到 (2,) 带有数组元素的数组。

In [34]: df['new'].to_numpy()                                                                  
Out[34]:
array([array([[0.61209572, 0.616934 , 0.94374808, 0.6818203 ],
[0.4236548 , 0.64589411, 0.43758721, 0.891773 ]]),
array([[0.52184832, 0.41466194, 0.26455561, 0.77423369],
[0.5488135 , 0.71518937, 0.60276338, 0.54488318]])], dtype=object)

保存这样的帧时也要小心。 csv 文件很难重新加载。

关于python - pandas 与复杂类型列不兼容的形状,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60083394/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com