gpt4 book ai didi

python - 从几个 numpy 系列创建 Pandas 数据框

转载 作者:行者123 更新时间:2023-11-28 20:31:55 24 4
gpt4 key购买 nike

我正在尝试创建一个 pandas 数据框,其中的列是 numpy 数组。我还想在创建时命名列。

这似乎是一项非常简单的任务。

尽管列的顺序错误,但它在不命名列的情况下工作正常:

import numpy as np
import pandas as pd

n_obs = 500

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80))

print(df.head())

输出:

49  3.802458
57 3.830600
29 4.991442
47 2.600079
70 1.658041
52 2.236296
37 3.327520
23 1.366954
22 1.509165
36 1.289901
77 3.834789
68 4.370223
40 4.532152
71 2.348842

当我尝试命名列时出现错误:

df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) , np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2']) 

输出:

Traceback (most recent call last):
File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4622, in create_block_manager_from_blocks
placement=slice(0, len(axes[0])))]
File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 2957, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
File "C:\Users\GBUHR4\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 120, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 1, placement implies 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "fake.py", line 33, in <module>
df = pd.DataFrame(np.random.uniform(low = 1.1, high = 5.0,size = (n_obs) ) ,
np.random.randint(size = (n_obs), low = 18, high = 80), columns =['col1','col2'
])
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 361, in __init__
copy=copy)
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\frame.py", line 533, in _init_ndarray
return create_block_manager_from_blocks([values], [columns, index])
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4631, in create_block_manager_from_blocks
construction_error(tot_items, blocks[0].shape[1:], axes, e)
File "C:\Users\Me\AppData\Local\Continuum\anaconda3\lib\site-packages\pand
as\core\internals.py", line 4608, in construction_error
passed, implied))
ValueError: Shape of passed values is (1, 500), indices imply (2, 500)

我找不到涵盖此内容的教程。这显然是一个非常简单的问题,但我找不到解决方案。

最佳答案

使用字典将数组传递给 DataFrame 构造函数:

n_obs = 500

a = np.random.uniform(low = 1.1, high = 5.0,size = (n_obs))
b = np.random.randint(size = (n_obs), low = 18, high = 80)

df = pd.DataFrame({'col1':a, 'col2':b})
print (df.head())
col1 col2
0 2.070148 23
1 1.735960 28
2 4.156209 72
3 4.253241 26
4 3.539951 45

如果可以使用 python 3.6 以下,请添加参数 columns 以指定顺序(从 Python 3.6 开始,标准 dict 类型默认保持插入顺序):

df = pd.DataFrame({'col1':a, 'col2':b}, columns=['col2','col1']) 
print (df.head())
col2 col1
0 23 2.070148
1 28 1.735960
2 72 4.156209
3 26 4.253241
4 45 3.539951

你也可以在 numpy 中堆叠数组,但得到相同类型的数据 - 这里是 float :

df = pd.DataFrame(np.column_stack((a,b)), columns=['col1','col2']) 
print (df.head())
col1 col2
0 2.070148 23.0
1 1.735960 28.0
2 4.156209 72.0
3 4.253241 26.0
4 3.539951 45.0

也在你的解决方案中:

df = pd.DataFrame(a, b) 

第一个数组创建列和第二个索引,就像:

df = pd.DataFrame(a, index=b) 
print (df.head())
0
23 2.070148
28 1.735960
72 4.156209
26 4.253241
45 3.539951

关于python - 从几个 numpy 系列创建 Pandas 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53537575/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com