gpt4 book ai didi

python - 获取 Pandas 数据帧的两列之间的一系列对的第一个和最后一个值

转载 作者:行者123 更新时间:2023-12-03 17:23:15 26 4
gpt4 key购买 nike

我有一个包含 3 列的数据框 Replaced_ID , New_IDInstallation DateNew_ID .
每个 New_ID 替换 Replaced_ID。

Replaced_ID      New_ID             Installation Date (of New_ID)
3 5 16/02/2018
5 7 17/05/2019
7 9 21/06/2019
9 11 23/08/2020
25 39 16/02/2017
39 41 16/08/2018
我的 目标 是获取包含 的数据帧第一 最后 序列的记录。我只关心第一个 Replaced_ID 值和最后一个 New_ID 值。
即从上面的数据框我想要这个
    Replaced_ID      New_ID             Installation Date (of New_ID)
3 11 23/08/2020
25 41 16/08/2018
据我想象,按日期排序并执行轮类不是这里的解决方案。
另外,我尝试加入专栏 New_IDReplaced_ID但事实并非如此,因为它只返回前一个序列。
我需要找到一种方法来获取序列 [3,5,7,9,11] & [25,41]结合 Replaced_ID & New_ID所有行的列。
我最关心的是获得第一个 Replaced_ID值和最后一个 New_ID值而不是 Installation Date因为我最终可以执行 join 。
这里有什么想法吗?谢谢。

最佳答案

首先,让我们创建 DataFrame:

import pandas as pd
import numpy as np
from io import StringIO

data = """Replaced_ID,New_ID,Installation Date (of New_ID)
3,5,16/02/2018
5,7,17/05/2019
7,9,21/06/2019
9,11,23/08/2020
25,39,16/02/2017
39,41,16/08/2018
11,14,23/09/2020
41,42,23/10/2020
"""
### note that I've added two rows to check whether it works with non-consecutive rows

### defining some short hands
r = "Replaced_ID"
n = "New_ID"
i = "Installation Date (of New_ID)"

df = pd.read_csv(StringIO(data),header=0,parse_dates=True,sep=",")
df[i] = pd.to_datetime(df[i], )
现在对于我的实际解决方案:
a = df[[r,n]].values.flatten()
### returns a flat list of r and n values which clearly show duplicate entries, i.e.:
# [ 3 5 5 7 7 9 9 11 25 39 39 41 11 14 41 42]

### now only get values that occur once,
# and reshape them nicely, such that the first column gives the lowest (replaced) id,
# and the second column gives the highest (new) id, i.e.:
# [[ 3 14]
# [25 42]]
u, c = np.unique( a, return_counts=True)
res = u[c == 1].reshape(2,-1)

### now filter the dataframe where "New_ID" is equal to the second column of res, i.e. [14,42]:
# and replace the entries in "r" with the "lowest possible values" of r
dfn = df[ df[n].isin(res[:,1].tolist()) ]
# print(dfn)
dfn.loc[:][r] = res[:,0]
print(dfn)
其中产生:
   Replaced_ID  New_ID Installation Date (of New_ID)
6 3 14 2020-09-23
7 25 42 2020-10-23

关于python - 获取 Pandas 数据帧的两列之间的一系列对的第一个和最后一个值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64784428/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com