gpt4 book ai didi

python - 根据日期和字符串长度修改DataFrame中的数据

转载 作者:行者123 更新时间:2023-12-01 08:33:29 25 4
gpt4 key购买 nike

我需要清理 Pandas DataFrame 中的一些数据并为此苦苦挣扎。

示例数据:

Date       | ID     | Name             | Address
-----------------------------------------------------------------------------------------------
1-4-1987 | 124578 | T.Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863
23-6-1990 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury
12-5-1960 | 746732 | Earline Schulist | 57367 Alfredo Vista East Bertaburgh
9-9-2010 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205
27-12-2017 | 124578 | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo

我想做的就是这个。按 ID 分组,从最近的日期获取名称并获取最长的地址字符串。将它们用于所有出现的 ID(在两个新列中:Name_newAddress_New)。请在下面找到所需的示例:

Date       | ID     | Name             | Address                                                | Name_New         | Address_New
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
27-12-2017 | 124578 | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863
1-4-1987 | 124578 | T. Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863 | Theresia Hilpert | 518 Hessel Plaza Lake Lonzo, AZ 11863
23-6-1990 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205
9-9-2010 | 947383 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205 | Birdie Reynolds | 964 Weissnat Green Suite 568 Rennerbury, WV 16241-5205
12-5-1960 | 746732 | Earline Schulist | 57367 Alfredo Vista East Bertaburgh | Earline Schulist | 57367 Alfredo Vista East Bertaburgh

我已经尝试过了,但无法将其组合起来以获得所需的结果。

def f1(s):
return max(s, key=len)

df_new = df['New_Address'] = df.groupby('ID').agg({'Address': f1})


df_new = df[df.groupby('ID').Date.transform('max') == df['Date']]

特别感谢您的帮助。

最佳答案

使用transform返回与原始 DataFrame 大小相同的 Series,然后按 Name 列创建索引并按最大 Date 获取值通过 idxmax :

df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y')
df['Address_New'] = df.groupby('ID')['Address'].transform(lambda s: max(s, key=len))
df['Name_New'] = df.set_index('Name').groupby('ID')['Date'].transform('idxmax').values
print (df)
Date ID Name \
0 1987-04-01 124578 T.Hilpert
1 1990-06-23 947383 Birdie Reynolds
2 1960-05-12 746732 Earline Schulist
3 2010-09-09 947383 Birdie Reynolds
4 2017-12-27 124578 Theresia Hilpert

Address \
0 518 Hessel Plaza Lake Lonzo, AZ 11863
1 964 Weissnat Green Suite 568 Rennerbury
2 57367 Alfredo Vista East Bertaburgh
3 964 Weissnat Green Suite 568 Rennerbury, WV 16...
4 518 Hessel Plaza Lake Lonzo

Address_New Name_New
0 518 Hessel Plaza Lake Lonzo, AZ 11863 Theresia Hilpert
1 964 Weissnat Green Suite 568 Rennerbury, WV 16... Birdie Reynolds
2 57367 Alfredo Vista East Bertaburgh Earline Schulist
3 964 Weissnat Green Suite 568 Rennerbury, WV 16... Birdie Reynolds
4 518 Hessel Plaza Lake Lonzo, AZ 11863 Theresia Hilpert

关于python - 根据日期和字符串长度修改DataFrame中的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53828903/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com