python - 如何将选定的数据转换为相同的长度(形状)-6ren

python - 如何将选定的数据转换为相同的长度(形状)

转载作者：行者123 更新时间：2023-12-01 06:40:48

我正在读取多个 .csv 文件作为具有相同形状的 panda DataFrame。对于某些索引，某些值为零，因此我想选择具有相同形状的每个索引的值，并为相同的索引放置零值并删除零以成为相同的形状:

a = pd.DataFrame(pd.read_csv("path_a",index_col=0))
b = pd.DataFrame(pd.read_csv("path_b",index_col=0))
c = pd.DataFrame(pd.read_csv("path_c",index_col=0))
print a,"\n",b,"\n",c
L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
for i in range (0,X):
    xdata  = a[i]
    xdata1 = b[i]
    xdata2 = c[i]
    xdata  = np.where(xdata2==0,0,xdata)
    xdata1 = np.where(xdata2==0,0,xdata1)
    xdata1 = np.where(xdata==0,0,xdata1)
    xdata2 = np.where(xdata==0,0,xdata2)
    xdata  = np.where(xdata1==0,0,xdata)
    xdata2 = np.where(xdata1==0,0,xdata2)
    indexX  = np.argwhere(xdata==0)
    index1X = np.argwhere(xdata1==0)
    index2X = np.argwhere(xdata2==0)
    xdata  = np.delete(xdata,indexX)
    xdata1 = np.delete(xdata1,index1X)
    xdata2 = np.delete(xdata2,index2X)
    print d[i],"\n",xdata,"\n",xdata1,"\n",xdata2

     1980  1985  1990  1995  2000  2005  2010
ISO3                                          
AFG    0.0   0.0   3.8   0.0   0.0   9.8   0.0
AGO    2.0   0.0   3.0   4.0   0.0   0.0   0.0
ALB    0.0   0.2   0.5   0.2   1.3   1.6   2.7
AND    0.0   0.0   0.0   0.0   0.0   0.0   0.0
ARE    0.7   0.8   0.9   1.7   2.3   2.7   3.0
ARG    3.1   6.7   5.3  15.1  17.2  18.2  18.7
ARM    0.4   0.5   0.5   0.5   0.4   1.2   1.3 
      1980  1985  1990  1995  2000  2005  2010
ISO3                                          
AFG    2.5   0.0   0.0   4.7   0.0   0.0   0.0
AGO   13.1  14.9  15.8  16.4  16.9  17.6  18.1
ALB    1.4   1.5   1.6   1.6   1.6   1.6   1.7
AND    0.2   0.2   0.2   0.2   0.1   0.4   0.6
ARE    0.0   0.0   0.0   0.0   0.0   0.0   0.0
ARG    1.8   1.8   1.7   1.8   1.8   1.9   1.9
ARM    1.8   1.8   1.7   0.0   1.8   1.9   1.5 
      1980  1985  1990  1995  2000  2005  2010
ISO3                                          
AFG    0.0   0.0   0.0   0.0   0.0   0.0   0.0
AGO    0.0   0.0   4.7   5.8   6.0   0.0   0.0
ALB    0.0   0.2   0.5   0.2   1.3   1.6   2.7
AND    1.4   1.8   2.3   3.7   0.0   0.0   5.4
ARE    0.7   0.8   0.9   1.7   2.3   2.7   3.0
ARG    3.1   6.7   5.3  15.1  17.2  18.2  18.7
ARM    0.4   0.5   0.5   0.5   0.4   1.2   1.3

AFG 
[] 
[] 
[]
AGO 
[ 3.  4.] 
[ 15.8  16.4] 
[ 4.7  5.8]
ALB 
[ 0.2  0.5  0.2  1.3  1.6  2.7] 
[ 1.5  1.6  1.6  1.6  1.6  1.7] 
[ 0.2  0.5  0.2  1.3  1.6  2.7]
AND 
[] 
[] 
[]
ARE 
[] 
[] 
[]
ARG 
[  3.1   6.7   5.3  15.1  17.2  18.2  18.7] 
[ 1.8  1.8  1.7  1.8  1.8  1.9  1.9] 
[  3.1   6.7   5.3  15.1  17.2  18.2  18.7]
ARM 
[ 0.4  0.5  0.5  0.4  1.2  1.3] 
[ 1.8  1.8  1.7  1.8  1.9  1.5] 
[ 0.4  0.5  0.5  0.4  1.2  1.3]

这段代码可以工作，但这是一种尝试性的方法，当数据量很大时效率不高。您能否建议我一种更有效的方法以及如何根据最小长度索引选择数据？

最佳答案

一个想法是乘以所有 3 个数组，然后测试它是否不是 0，也可以在列表 L1 中使用 3 个数组循环。然后逻辑也发生了变化 - 选择与掩码不匹配的值，而不是 np.argwhere 和 np.delete:

L = np.array(a.shape)
X = L[0]
d = a.index.values
a = np.array(a)
b = np.array(b)
c = np.array(c)
m = (a * b * c) != 0
L1 = [a,b,c]

for i in range (0,X):
    for arr in L1:
        xdata  = arr[i][m[i]]
        print (xdata)

如果使用 pandas 0.24+，那么转换为 numpy 数组的更好方法是使用 to_numpy :

L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]

for i in range (0,X):
    for arr in L1:
        xdata  = arr[i][m[i]]
        print (xdata)

编辑:

L = np.array(a.shape)
X = L[0]
d = a.index.to_numpy()
a = a.to_numpy()
b = b.to_numpy()
c = c.to_numpy()
m = (a * b * c) != 0
L1 = [a,b,c]

for i in range (0,X):
    out = []
    for arr in L1:
        xdata  = arr[i][m[i]]
        out.append(xdata)
    data = np.vstack((out))
    print (data)

[]
[[ 3.   4. ]
 [15.8 16.4]
 [ 4.7  5.8]]
[[0.2 0.5 0.2 1.3 1.6 2.7]
 [1.5 1.6 1.6 1.6 1.6 1.7]
 [0.2 0.5 0.2 1.3 1.6 2.7]]
[]
[]
[[ 3.1  6.7  5.3 15.1 17.2 18.2 18.7]
 [ 1.8  1.8  1.7  1.8  1.8  1.9  1.9]
 [ 3.1  6.7  5.3 15.1 17.2 18.2 18.7]]
[[0.4 0.5 0.5 0.4 1.2 1.3]
 [1.8 1.8 1.7 1.8 1.9 1.5]
 [0.4 0.5 0.5 0.4 1.2 1.3]]