gpt4 book ai didi

python - 使用两个现有列创建和填充 Pandas 数据框列

转载 作者:太空宇宙 更新时间:2023-11-04 09:40:02 24 4
gpt4 key购买 nike

我的数据框有 4 列,如下所示。

我有什么:

ID  start_date  end_date    active
1,111 6/30/2015 8/6/1904 1 to 10
1,111 6/28/2016 3/30/1905 1 to 10
1,111 7/31/2017 6/6/1905 1 to 10
1,111 7/31/2018 6/6/1905 1 to 9
1,111 5/31/2019 12/4/1904 1 to 9
3,033 3/31/2015 5/18/1908 3 to 7
3,033 3/31/2016 11/24/1905 3 to 7
3,033 3/31/2017 1/20/1906 3 to 7
3,033 3/31/2018 1/8/1906 2 to 7
3,033 4/4/2019 2200,0 2 to 8

我想根据“事件”列的值再生成 10 个列,如下所示。有没有办法有效地填充它。

我想要实现的目标

ID  start_date  end_date    active  Type 1  Type 2  Type 3  Type 4  Type 5  Type 6  Type 7  Type 8  Type 9  Type 10
1,111 6/30/2015 8/6/1904 1 to 10 1 1 1 1 1 1 1 1 1 1
1,111 6/28/2016 3/30/1905 1 to 10 1 1 1 1 1 1 1 1 1 1
1,111 7/31/2017 6/6/1905 1 to 10 1 1 1 1 1 1 1 1 1 1
1,111 7/31/2018 6/6/1905 1 to 9 1 1 1 1 1 1 1 1 1
1,111 5/31/2019 12/4/1904 1 to 9 1 1 1 1 1 1 1 1 1
3,033 3/31/2015 5/18/1908 3 to 7 1 1 1 1 1
3,033 3/31/2016 11/24/1905 3 to 7 1 1 1 1 1
3,033 3/31/2017 1/20/1906 3 to 7 1 1 1 1 1
3,033 3/31/2018 1/8/1906 2 to 7 1 1 1 1 1 1
3,033 4/4/2019 2200,0 2 to 8 1 1 1 1 1 1 1

最佳答案

通过np.arange 使用自定义函数:

def f(x):
a = list(map(int, x.split(' to ')))
return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type '))
print (df)
ID start_date end_date active Type 1 Type 2 Type 3 Type 4 \
0 1,111 6/30/2015 8/6/1904 1 to 10 1.0 1.0 1.0 1.0
1 1,111 6/28/2016 3/30/1905 1 to 10 1.0 1.0 1.0 1.0
2 1,111 7/31/2017 6/6/1905 1 to 10 1.0 1.0 1.0 1.0
3 1,111 7/31/2018 6/6/1905 1 to 9 1.0 1.0 1.0 1.0
4 1,111 5/31/2019 12/4/1904 1 to 9 1.0 1.0 1.0 1.0
5 3,033 3/31/2015 5/18/1908 3 to 7 NaN NaN 1.0 1.0
6 3,033 3/31/2016 11/24/1905 3 to 7 NaN NaN 1.0 1.0
7 3,033 3/31/2017 1/20/1906 3 to 7 NaN NaN 1.0 1.0
8 3,033 3/31/2018 1/8/1906 2 to 7 NaN 1.0 1.0 1.0
9 3,033 4/4/2019 2200,0 2 to 8 NaN 1.0 1.0 1.0

Type 5 Type 6 Type 7 Type 8 Type 9 Type 10
0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0 1.0 NaN
4 1.0 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 NaN NaN NaN
6 1.0 1.0 1.0 NaN NaN NaN
7 1.0 1.0 1.0 NaN NaN NaN
8 1.0 1.0 1.0 NaN NaN NaN
9 1.0 1.0 1.0 1.0 NaN NaN

类似的:

def f(x):
a = list(map(int, x.split(' to ')))
return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type ').fillna(0).astype(int))
print (df)
ID start_date end_date active Type 1 Type 2 Type 3 Type 4 \
0 1,111 6/30/2015 8/6/1904 1 to 10 1 1 1 1
1 1,111 6/28/2016 3/30/1905 1 to 10 1 1 1 1
2 1,111 7/31/2017 6/6/1905 1 to 10 1 1 1 1
3 1,111 7/31/2018 6/6/1905 1 to 9 1 1 1 1
4 1,111 5/31/2019 12/4/1904 1 to 9 1 1 1 1
5 3,033 3/31/2015 5/18/1908 3 to 7 0 0 1 1
6 3,033 3/31/2016 11/24/1905 3 to 7 0 0 1 1
7 3,033 3/31/2017 1/20/1906 3 to 7 0 0 1 1
8 3,033 3/31/2018 1/8/1906 2 to 7 0 1 1 1
9 3,033 4/4/2019 2200,0 2 to 8 0 1 1 1

Type 5 Type 6 Type 7 Type 8 Type 9 Type 10
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 0
4 1 1 1 1 1 0
5 1 1 1 0 0 0
6 1 1 1 0 0 0
7 1 1 1 0 0 0
8 1 1 1 0 0 0
9 1 1 1 1 0 0

另一个非循环解决方案 - 想法是删除重复项,使用 get_dummies 创建新行, reindex用于添加缺失的列,最后添加 1 倍数 cumsum编辑值:

df1 = (df.set_index('active', drop=False)
.pop('active')
.drop_duplicates()
.str.get_dummies(' to '))

df1.columns = df1.columns.astype(int)
df1 = df1.reindex(columns=np.arange(df1.columns.min(),df1.columns.max() + 1), fill_value=0)
df1 = (df1.cumsum(axis=1) * df1.iloc[:, ::-1].cumsum(axis=1)).clip_upper(1)
print (df1)
1 2 3 4 5 6 7 8 9 10
active
1 to 10 1 1 1 1 1 1 1 1 1 1
1 to 9 1 1 1 1 1 1 1 1 1 0
3 to 7 0 0 1 1 1 1 1 0 0 0
2 to 7 0 1 1 1 1 1 1 0 0 0
2 to 8 0 1 1 1 1 1 1 1 0 0

df = df.join(df1.add_prefix('Type '), on='active')
print (df)

      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0 1,111 6/30/2015 8/6/1904 1 to 10 1 1 1 1
1 1,111 6/28/2016 3/30/1905 1 to 10 1 1 1 1
2 1,111 7/31/2017 6/6/1905 1 to 10 1 1 1 1
3 1,111 7/31/2018 6/6/1905 1 to 9 1 1 1 1
4 1,111 5/31/2019 12/4/1904 1 to 9 1 1 1 1
5 3,033 3/31/2015 5/18/1908 3 to 7 0 0 1 1
6 3,033 3/31/2016 11/24/1905 3 to 7 0 0 1 1
7 3,033 3/31/2017 1/20/1906 3 to 7 0 0 1 1
8 3,033 3/31/2018 1/8/1906 2 to 7 0 1 1 1
9 3,033 4/4/2019 2200,0 2 to 8 0 1 1 1

Type 5 Type 6 Type 7 Type 8 Type 9 Type 10
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 0
4 1 1 1 1 1 0
5 1 1 1 0 0 0
6 1 1 1 0 0 0
7 1 1 1 0 0 0
8 1 1 1 0 0 0
9 1 1 1 1 0 0

关于python - 使用两个现有列创建和填充 Pandas 数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52089554/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com