gpt4 book ai didi

python - Pandas GroupBy String 连接列名而不是列值

转载 作者:行者123 更新时间:2023-12-01 01:30:37 26 4
gpt4 key购买 nike

我正在尝试使用此 SO as guide 对由 DocID 和字符串组成的 DataFrame 进行分组但我最终得到的不是每个 DocID 1 行且所有字符串值均以空格分隔的数据框,而是一列包含列值。

有人可以指出我的错误吗?

示例数据

StringDF.head()

DocID LessStopWords
0 dd9ae7c8-7e98-4539-ab81-24c4780a6756 judgment of the court chamber
1 dd9ae7c8-7e98-4539-ab81-24c4780a6756 the request proceedings
2 dd9ae7c8-7e98-4539-ab81-24c4780a6756 legal context law
3 dd9ae7c8-7e98-4539-ab81-24c4780a6756 article 1 directive
4 dd9ae7c8-7e98-4539-ab81-24c4780a6756 the status taken

我的代码

DocsForTopicModel=StringDF.groupby(['DocID'],as_index=False).agg(lambda x : ' '.join(x))

我的输出

     DocID                                  LessStopWords
0 010b158d-8c0b-49ad-9340-774893e4f62f DocID LessStopWords
1 02874037-416d-4b91-8e2d-1a288b8c3a7b DocID LessStopWords
2 05b9ea7b-b5f0-4757-854c-b303a295f606 DocID LessStopWords
3 06f87756-4dbe-4199-a8e2-b504451e823a DocID LessStopWords
4 070bd4d1-6830-447e-9042-12c6def18822 DocID LessStopWords

我希望的输出

     DocID                                      LessStopWords
0 010b158d-8c0b-49ad-9340-774893e4f62f judgment of the court chamber the request proceedings legal context law article 1 directive
1 02874037-416d-4b91-8e2d-1a288b8c3a7b ...

最佳答案

您还可以使用.str.cat(sep=' ')(进行串联):

>>> df.groupby('DocID')['LessStopWords'].apply(lambda ser: ser.str.cat(sep=' '))
DocID
dd9ae7c8-7e98-4539-ab81-24c4780a6756 judgment of the court chamber the request proc...
Name: LessStopWords, dtype: object

更多示例参见Working with Text Data .

<小时/>

更大的例子:

>>> import string
>>> import uuid
>>>
>>> import numpy as np
>>> import pandas as pd
>>>
>>> uids = np.random.choice([uuid.uuid4() for _ in range(3)], size=10)
>>> words = np.random.choice(list(string.ascii_letters), size=10)
>>>
>>> df = pd.DataFrame({'DocID': uids, 'LessStopWords': words})
>>> df
DocID LessStopWords
0 8ec3faf7-a771-4e50-87d7-127a69d4d738 p
1 0befc0aa-9311-4154-bced-00a280c99cdd q
2 8ec3faf7-a771-4e50-87d7-127a69d4d738 t
3 de1021d3-ce47-4f56-8e4d-47d389473dd6 j
4 0befc0aa-9311-4154-bced-00a280c99cdd L
5 8ec3faf7-a771-4e50-87d7-127a69d4d738 t
6 de1021d3-ce47-4f56-8e4d-47d389473dd6 g
7 0befc0aa-9311-4154-bced-00a280c99cdd D
8 0befc0aa-9311-4154-bced-00a280c99cdd d
9 8ec3faf7-a771-4e50-87d7-127a69d4d738 J
>>> df.groupby('DocID')['LessStopWords'].apply(lambda ser: ser.str.cat(sep=' '))
DocID
0befc0aa-9311-4154-bced-00a280c99cdd q L D d
8ec3faf7-a771-4e50-87d7-127a69d4d738 p t t J
de1021d3-ce47-4f56-8e4d-47d389473dd6 j g
Name: LessStopWords, dtype: object

关于python - Pandas GroupBy String 连接列名而不是列值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52907995/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com