gpt4 book ai didi

python - 使用 python 和 nltk 从文本文件中提取候选人的姓名

转载 作者:行者123 更新时间:2023-11-30 22:25:18 26 4
gpt4 key购买 nike

import re
import spacy
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')
from nltk.corpus import wordnet

inputfile = open('inputfile.txt', 'r')
String= inputfile.read()
nlp = spacy.load('en_core_web_sm')

def candidate_name_extractor(input_string, nlp):
input_string = str(input_string)

doc = nlp(input_string)

# Extract entities
doc_entities = doc.ents

# Subset to person type entities
doc_persons = filter(lambda x: x.label_ == 'PERSON', doc_entities)
doc_persons = filter(lambda x: len(x.text.strip().split()) >= 2, doc_persons)
doc_persons = list(map(lambda x: x.text.strip(), doc_persons))
print(doc_persons)
# Assuming that the first Person entity with more than two tokens is the candidate's name
candidate_name = doc_persons[0]
return candidate_name

if __name__ == '__main__':
names = candidate_name_extractor(String, nlp)

print(names)

我想从文本文件中提取候选人的姓名,但它返回错误的值。当我删除带有 map 的列表时, map 也无法工作并给出错误

最佳答案

import re
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')
from nltk.corpus import wordnet

String = 'Ravana was killed in a war'

Sentences = nltk.sent_tokenize(String)
Tokens = []
for Sent in Sentences:
Tokens.append(nltk.word_tokenize(Sent))
Words_List = [nltk.pos_tag(Token) for Token in Tokens]

Nouns_List = []

for List in Words_List:
for Word in List:
if re.match('[NN.*]', Word[1]):
Nouns_List.append(Word[0])

Names = []
for Nouns in Nouns_List:
if not wordnet.synsets(Nouns):
Names.append(Nouns)

print (Names)

检查此代码。我得到 Ravana 作为输出。

编辑:

我使用简历中的几句话创建了一个文本文件,并将其作为程序的输入。下面仅显示更改的代码部分:

import io

File = io.open("Documents\\Temp.txt", 'r', encoding = 'utf-8')
String = File.read()
String = re.sub('[/|.|@|%|\d+]', '', String)

它会返回 wordnet 语料库中没有的所有名称,例如我的名字、我的房屋名称、地点、大学名称和地点。

关于python - 使用 python 和 nltk 从文本文件中提取候选人的姓名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47571213/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com