I have a pandas table I read from database and it contains covariance matrix (the number is random so that it is not positive semi-def). I would like a fast way to construct a numpy matrix from the pandas table.
我有一个我从数据库中读取的熊猫表格,它包含协方差矩阵(数字是随机的,所以它不是正半定的)。我想用一种快速的方法从熊猫的桌子上构建一个数字矩阵。
Pandas Table I have
我有一张熊猫桌子
index1 |
index2 |
var |
apple |
apple |
1 |
apple |
orange |
1 |
orange |
orange |
0.5 |
lemon |
lemon |
1.2 |
orange |
lemon |
-0.5 |
apple |
lemon |
-0.8 |
Expected result
[[1.2, -0.5, -0.8], [-0.5, 0.5, 1.0], [-0.8, 1.0, 1.0]]
预期结果[[1.2,-0.5,-0.8],[-0.5,0.5,1.0],[-0.8,1.0,1.0]]
Below is the sample code I tried, but it's not very fast.
下面是我尝试过的示例代码,但速度不是很快。
import numpy as np
import pandas as pd
pd_cov = pd.DataFrame([['apple', 'apple', 1], ['apple', 'orange', 1], ['orange', 'orange', 0.5], ['lemon', 'lemon', 1.2], ['orange', 'lemon', -0.5], ['apple', 'lemon', -0.8]], columns = ['index1', 'index2', 'var'])
def cov_obt(x,y):
try:
return(float(pd_cov_ind.loc[x, y]))
except:
return(float(pd_cov_ind.loc[y, x]))
ind = list(set(pd_cov['index1']))
pd_cov_ind = pd_cov.set_index(['index1', 'index2'])
np.array([[cov_obt(x,y) for y in ind] for x in ind])
更多回答
优秀答案推荐
Here's one approach:
这里有一种方法:
import pandas as pd
import numpy as np
m = pd_cov.pivot_table(index='index1', columns='index2',
sort=False, fill_value=0).to_numpy()
m = m + m.T - np.tril(m)
m
array([[ 1. , 1. , -0.8],
[ 1. , 0.5, -0.5],
[-0.8, -0.5, 1.2]])
Explanation
解释
- Use
df.pivot_table
to pivot your data, with sort
parameter set to False
(maintaining order) and fill_value
set to 0
(needed for step 2). Chain to_numpy
and assign to variable m
.
- We now have a matrix (
m
) with the upper triangle filled as expected, and the lower triangle still filled with zeros (but for the diagonal). We can "copy" the values from the upper triangle by adding m
and m.T
(its transposed version). Since the diagonal will be doubled this way, as a final step, we need to substract the diagonal zeroed, which we can retrieve by applying np.tril
.
更多回答
我是一名优秀的程序员,十分优秀!