gpt4 book ai didi

python - 根据特定单元格中的值移动 pd.dataframe 的行

转载 作者:行者123 更新时间:2023-12-04 10:54:19 25 4
gpt4 key购买 nike

想象一下,我们有一个结构如下的数据框:

df = pd.DataFrame({
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,5,4,5,5,4],
'C':[0,0,0,0,0,7],
'D':[0,1,3,5,7,1],
'E':[5,3,6,9,2,4],

一般的想法是移动每一行,对应于'Year'列中的值,2017年是基准年,每行应该在(Year - 2017)单元格上向右移动,新单元格应该用零填充(0 ), 喜欢 :
df = pd.DataFrame({
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,0,0,5,5,4],
'C':[0,0,4,0,0,7],
'D':[0,5,0,5,7,1],
'E':[5,0,3,9,2,4],
'F':[0,1,6,0,0,0],
'G':[0,3,0,0,0,0],
})

ps:实际上我们然后需要对一些结果行进行成对求和,以便每列的“年份”相同

当我们对 0 和 2 行求和时,这只是第一步。那么它应该是1和3,依此类推

enter image description here
所以,也许有一些 Pandas 功能可以帮助完成这项任务而无需预先转移......

最佳答案

如果使用 shift默认情况下,在 Pandas 中,最后一列将丢失。因此有必要首先添加由缺失值填充的新列 - 列数取决于非 2017 年值的差异。

df = df.set_index('Year')

diff = np.setdiff1d(df.index.dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]

df = df.assign(**{f'new{x}':np.nan for x in range(max(diff-2017))})

那么你可以使用 shift在循环中并通过 DataFrame.loc 过滤按年份索引:
for y in diff:
df.loc[y, :] = df.astype(float).shift(y - 2017, axis=1).loc[y, :]

最后替换缺失值,转换为整数并将索引转换为列:
df = df.fillna(0).astype(int).reset_index()
print (df)
Year B C D E new0 new1
0 2017 4 0 0 5 0 0
1 2019 0 0 5 0 1 3
2 2018 0 4 0 3 6 0
3 2017 5 0 5 9 0 0
4 2017 5 0 7 2 0 0
5 2017 4 7 1 4 0 0

编辑:

另一列的解决方案:
df = pd.DataFrame({
'new':list('abcdef'),
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,5,4,5,5,4],
'C':[0,0,0,0,0,7],
'D':[0,1,3,5,7,1],
'E':[5,3,6,9,2,4]})
print (df)
new Year B C D E
0 a 2017 4 0 0 5
1 b 2019 5 0 1 3
2 c 2018 4 0 3 6
3 d 2017 5 0 5 9
4 e 2017 5 0 7 2
5 f 2017 4 7 1 4
df = df.set_index(['new','Year'])

diff = np.setdiff1d(df.index.get_level_values('Year').dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]

df1 = pd.DataFrame(index=df.index, columns=['new{}'.format(x) for x in range(max(diff-2017))])
df = pd.concat([df, df1], axis=1)
print (df)
B C D E new0 new1
new Year
a 2017 4 0 0 5 NaN NaN
b 2019 5 0 1 3 NaN NaN
c 2018 4 0 3 6 NaN NaN
d 2017 5 0 5 9 NaN NaN
e 2017 5 0 7 2 NaN NaN
f 2017 4 7 1 4 NaN NaN
for y in diff:
idx = pd.IndexSlice
df.loc[idx[:, y], :] = df.astype(float).shift(y - 2017, axis=1).loc[idx[:, y], :]

df = df.fillna(0).astype(int).reset_index()
print (df)
new Year B C D E new0 new1
0 a 2017 4 0 0 5 0 0
1 b 2019 0 0 5 0 1 3
2 c 2018 0 4 0 3 6 0
3 d 2017 5 0 5 9 0 0
4 e 2017 5 0 7 2 0 0
5 f 2017 4 7 1 4 0 0

关于python - 根据特定单元格中的值移动 pd.dataframe 的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59299112/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com