gpt4 book ai didi

python - 句子分词器 - spaCy 到 pandas

转载 作者:行者123 更新时间:2023-12-01 02:17:37 24 4
gpt4 key购买 nike

使用 spaCy NLP 执行句子标记器并将其写入 Pandas Dataframe。

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals

# Extraction
import spacy,en_core_web_sm
import pandas as pd

# Read the text file
nlp = en_core_web_sm.load()
doc = nlp(unicode(open('o.txt').read().decode('utf8')) )

for idno, sentence in enumerate(doc.sents):
print 'Sentence {}:'.format(idno + 1), sentence

Sentences = list(doc.sents)
df = pd.DataFrame(Sentences)
print df

输出:

Sentence 1: This is a sample sentence.
Sentence 2: This is a second sample sentence.
Sentence 3: This is a third sample sentence.
0 1 2 3 4 5 6
0 This is a sample sentence . None
1 This is a second sample sentence .
2 This is a third sample sentence .

Pandas 的预期输出

    0
0 This is a sample sentence.
1 This is a second sample sentence.
2 This is a third sample sentence.

如何达到预期的输出?

最佳答案

您应该能够使用 pd.read_table(input_file_path) 并调整参数以将文本导入到单个列,我们将其称为 df['text']。

然后试试这个:

df['sents'] = df['text'].apply(lambda x: list(nlp(x).sents))

您将有一个新列,其中包含句子标记列表。

祝你好运!

关于python - 句子分词器 - spaCy 到 pandas,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48249291/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com