gpt4 book ai didi

python - 如何在 pandas 数据框中堆叠 wthin 来执行其引用?

转载 作者:行者123 更新时间:2023-11-30 22:44:38 25 4
gpt4 key购买 nike

我有一个包含大量文档的大型 pandas 数据框:

    id  text
1 doc2 Google i...
2 doc3 Amazon...
3 doc4 This was...
...
n docN nice camara...

如何将所有文档堆叠成执行各自id的句子?:

    id  text
1 doc1 Google is a great company.
2 doc1 It is in silicon valley.
3 doc1 Their search engine is the best
4 doc2 Amazon is a great store.
5 doc2 it is located in Seattle.
6 doc2 its new product is alexa.
5 doc2 its expensive.
5 doc3 This was a great product.
...
n docN nice camara I really liked it.

我尝试过:

import nltk
def sentence(document):
sentences = nltk.sent_tokenize(document.strip(' '))
return sentences


df['sentece'] = df['text'].apply(sentence)
df.stack(level=0)

然而,这并没有奏效。知道如何堆叠句子来执行它们的出处吗?

最佳答案

这里有一个与您类似的问题的解决方案:pandas: When cell contents are lists, create a row for each element in the list 。这是我对您的特定任务的解释:

df['sents'] = df['text'].apply(lambda x: nltk.sent_tokenize(x))
s = df.apply(lambda x: pd.Series(x['sents']), axis=1).stack().\
reset_index(level=1, drop=True)
s.name = 'sents'
df = df.drop(['sents','text'], axis=1).join(s)

关于python - 如何在 pandas 数据框中堆叠 wthin 来执行其引用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41472234/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com