gpt4 book ai didi

python - 从文本中提取国籍和国家

转载 作者:太空狗 更新时间:2023-10-29 22:25:27 26 4
gpt4 key购买 nike

我想使用 nltk 从文本中提取所有提及的国家和国籍,我使用 POS 标记提取所有 GPE 标记的标记,但结果并不令人满意。

 abstract="Thyroid-associated orbitopathy (TO) is an autoimmune-mediated orbital inflammation that can lead to disfigurement and blindness. Multiple genetic loci have been associated with Graves' disease, but the genetic basis for TO is largely unknown. This study aimed to identify loci associated with TO in individuals with Graves' disease, using a genome-wide association scan (GWAS) for the first time to our knowledge in TO.Genome-wide association scan was performed on pooled DNA from an Australian Caucasian discovery cohort of 265 participants with Graves' disease and TO (cases) and 147 patients with Graves' disease without TO (controls). "

sent = nltk.tokenize.wordpunct_tokenize(abstract)
pos_tag = nltk.pos_tag(sent)
nes = nltk.ne_chunk(pos_tag)
places = []
for ne in nes:
if type(ne) is nltk.tree.Tree:
if (ne.label() == 'GPE'):
places.append(u' '.join([i[0] for i in ne.leaves()]))
if len(places) == 0:
places.append("N/A")

得到的结果是:

['Thyroid', 'Australian', 'Caucasian', 'Graves']

有些是国籍,有些只是名词。

那么我做错了什么或者是否有其他方法可以提取此类信息?

最佳答案

因此,在富有成效的评论之后,我深入研究了不同的 NER 工具,以找到识别国籍和国家提及的最佳工具,并发现 SPACY 有一个 NORP 实体,可以有效地提取国籍。 https://spacy.io/docs/usage/entity-recognition

关于python - 从文本中提取国籍和国家,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37886534/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com