gpt4 book ai didi

python - 按日期拆分或合并操作

转载 作者:行者123 更新时间:2023-12-01 09:19:18 26 4
gpt4 key购买 nike

我喜欢根据相同或不同日期的不同事件 (ACT) 创建序列数据库。如您所见,某些行可能包含 NaN 值。我需要最终数据来训练有关事件序列的机器学习模型。

ID  ACT1        ACT2        ACT3        ACT4        ACT5    
0 2015-08-11 2015-08-16 2015-08-16 2015-09-22 2015-08-19
1 2014-07-16 2014-07-16 2014-09-16 NaT 2014-09-12
2 2016-07-16 NaT 2017-09-16 2017-09-16 2017-12-16

预期输出将根据日期值拆分或合并,如下表所示:

ID Sequence1  Sequence2  Sequence3  Sequence4  
0 ACT1 ACT2,ACT3 ACT5 ACT4
1 ACT1,ACT2 ACT5 ACT3
2 ACT1 ACT3,ACT4 ACT5

以下脚本将仅输出包含整个序列的字符串:

df['Sequence'] = df.loc[:, cols].apply(lambda dr: ','.join(df.loc[:, cols].columns[dr.dropna().argsort()]), axis=1)

Sequence
ACT1,ACT2,ACT3,ACT5,ACT4
ACT1,ACT2,ACT5,ACT3
ACT1,ACT3,ACT4,ACT5

最佳答案

这很有挑战性,但我相信这对您有用。

from collections import defaultdict
import pandas as pd

data = {
'ACT1': [pd.Timestamp(year=2015, month=8, day=11),
pd.Timestamp(year=2014, month=7, day=16),
pd.Timestamp(year=2016, month=7, day=16)],
'ACT2': [pd.Timestamp(year=2015, month=8, day=16),
pd.Timestamp(year=2014, month=7, day=16),
np.nan],
'ACT3': [pd.Timestamp(year=2015, month=8, day=16),
pd.Timestamp(year=2014, month=9, day=16),
pd.Timestamp(year=2017, month=9, day=16)],
'ACT4': [pd.Timestamp(year=2015, month=9, day=22),
np.nan,
pd.Timestamp(year=2017, month=9, day=16)],
'ACT5': [pd.Timestamp(year=2015, month=8, day=19),
pd.Timestamp(year=2014, month=9, day=12),
pd.Timestamp(year=2017, month=12, day=16)]}

df = pd.DataFrame(data)

# Unstack so we can create groups
unstacked = df.unstack().reset_index()

# This will keep track of our sequence data
sequences = defaultdict(list)

# Here we get our groups, e.g., 'ACT1,ACT2', etc.;
# We group by date first, then by original index (0,1,2)
for i, g in unstacked.groupby([0, 'level_1']):
sequences[i[1]].append(','.join(g.level_0))

# How many sequences (columns) we're going to need
n_seq = len(max(sequences.values(), key=len))

# Any NaTs will always shift your data to the left,
# so to speak, so we need to right pad the rows
for k in sequences:
while len(sequences[k]) < n_seq:
sequences[k].append('')

# Create column labels and make new dataframe
columns = ['Sequence{}'.format(i) for i in range(1, n_seq + 1)]
print pd.DataFrame(list(sequences.values()), columns=columns)

Sequence1 Sequence2 Sequence3 Sequence4
0 ACT1 ACT2,ACT3 ACT5 ACT4
1 ACT1,ACT2 ACT5 ACT3
2 ACT1 ACT3,ACT4 ACT5

关于python - 按日期拆分或合并操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50932977/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com