gpt4 book ai didi

python - 如何使数据帧列表的长度全部相等

转载 作者:行者123 更新时间:2023-11-30 22:45:56 25 4
gpt4 key购买 nike

如果我有许多 DataFrames 位于这样的列表中:

X = pd.DataFrame({"t":[1,2,3,4,5,6,7,8],"A":[34,12,78,84,26,84,26,34], "B":[54,87,35,25,82,35,25,82], "C":[56,78,0,14,13,0,14,13], "D":[0,23,72,56,14,72,56,14], "E":[78,12,31,0,34,31,0,34]})
Y = pd.DataFrame({"t":[1,2,3],"A":[45,24,65], "B":[45,87,65], "C":[98,52,32], "D":[0,23,1], "E":[24,12, 65]})
Z = pd.DataFrame({"t":[1,2,3,4,5],"A":[14,96,25,2,25], "B":[47,7,5,58,34], "C":[85,45,65,53,53], "D":[3,35,12,56,236], "E":[68,10,45,46,85]})

allFiles = [X, Y, Z]
list_ = []
for file_ in allFiles:
df = file_
df = df.sort('t')
list_.append(df)

列表如下所示:

enter image description here

如何将每个数据帧的长度缩短到最短的长度?

编辑。请记住,我想保留带有 df 的列表

最佳答案

您可以使用concatdropna如果 DataFrames 中没有 NaN 值:

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
print (df)
A B C \
A B C D E t A B C D E t A B C
0 34 54 56 0 78 1 45.0 45.0 98.0 0.0 24.0 1.0 14.0 47.0 85.0
1 12 87 78 23 12 2 24.0 87.0 52.0 23.0 12.0 2.0 96.0 7.0 45.0
2 78 35 0 72 31 3 65.0 65.0 32.0 1.0 65.0 3.0 25.0 5.0 65.0


D E t
0 3.0 68.0 1.0
1 35.0 10.0 2.0
2 12.0 45.0 3.0

然后通过 groupby 创建新列表使用列表理解:

list_ = [g for i, g in df.groupby(level=0, axis=1, group_keys=False)]
print (list_)
[ A
A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, B
A B C D E t
0 45.0 45.0 98.0 0.0 24.0 1.0
1 24.0 87.0 52.0 23.0 12.0 2.0
2 65.0 65.0 32.0 1.0 65.0 3.0, C
A B C D E t
0 14.0 47.0 85.0 3.0 68.0 1.0
1 96.0 7.0 45.0 35.0 10.0 2.0
2 25.0 5.0 65.0 12.0 45.0 3.0]

但是输出是Multiindex,所以你需要 groupbyget_value 创建的第一级然后被 droplevel 删除:

df = pd.concat(allFiles, keys=list('ABC'), axis=1).dropna()
lvl = df.columns.get_level_values(0)
df.columns = df.columns.droplevel(0)
print (df)
A B C D E t A B C D E t A B C \
0 34 54 56 0 78 1 45.0 45.0 98.0 0.0 24.0 1.0 14.0 47.0 85.0
1 12 87 78 23 12 2 24.0 87.0 52.0 23.0 12.0 2.0 96.0 7.0 45.0
2 78 35 0 72 31 3 65.0 65.0 32.0 1.0 65.0 3.0 25.0 5.0 65.0

D E t
0 3.0 68.0 1.0
1 35.0 10.0 2.0
2 12.0 45.0 3.0
list_ = [g for i, g in df.groupby(lvl, axis=1)]

print (list_)

[ A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, A B C D E t
0 45.0 45.0 98.0 0.0 24.0 1.0
1 24.0 87.0 52.0 23.0 12.0 2.0
2 65.0 65.0 32.0 1.0 65.0 3.0, A B C D E t
0 14.0 47.0 85.0 3.0 68.0 1.0
1 96.0 7.0 45.0 35.0 10.0 2.0
2 25.0 5.0 65.0 12.0 45.0 3.0]

print (list_[0])
A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3

另一个更简单的解决方案:

allFiles = [X, Y, Z]

min_len = np.min([len(df.index) for df in allFiles])
print (min_len)
3

print ([df.reindex(np.arange(min_len)) for df in allFiles])
[ A B C D E t
0 34 54 56 0 78 1
1 12 87 78 23 12 2
2 78 35 0 72 31 3, A B C D E t
0 45 45 98 0 24 1
1 24 87 52 23 12 2
2 65 65 32 1 65 3, A B C D E t
0 14 47 85 3 68 1
1 96 7 45 35 10 2
2 25 5 65 12 45 3]

编辑1:如果t是具有唯一值的index,则解决方案。

获取最短索引,然后使用reindex列表理解中:

X = X.set_index('t')
Y = Y.set_index('t')
Z = Z.set_index('t')
allFiles = [X, Y, Z]

min_idx = min([df.index for df in allFiles], key=len)
print (min_idx)
Int64Index([1, 2, 3], dtype='int64', name='t')

print ([df.reindex(min_idx) for df in allFiles])
[ A B C D E
t
1 34 54 56 0 78
2 12 87 78 23 12
3 78 35 0 72 31, A B C D E
t
1 45 45 98 0 24
2 24 87 52 23 12
3 65 65 32 1 65, A B C D E
t
1 14 47 85 3 68
2 96 7 45 35 10
3 25 5 65 12 45]

关于python - 如何使数据帧列表的长度全部相等,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41038658/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com