gpt4 book ai didi

python - 区分 Spacy NER 中的国家和城市

转载 作者:行者123 更新时间:2023-11-30 21:54:00 26 4
gpt4 key购买 nike

我正在尝试使用 spacy NER 从组织地址中提取国家/地区,但是,它使用相同的标签 GPE 来标记国家/地区和城市。有什么办法可以区分它们吗?

例如:

nlp = en_core_web_sm.load()

doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')

for ent in doc.ents:
if ent.label_ == 'GPE':
print(ent.text)

回馈

Tempe
AZ
United States
United States
Tempe
AZ
United States
Tempe
AZ
United States

最佳答案

正如其他答案所提到的,预训练的 Spacy 模型的 GPE 适用于国家、城市和州。不过,有一个解决方法,并且我确信可以使用多种方法。

一种方法:您可以向模型添加自定义标签。 Towards Data Science有一篇好文章这可以帮助你做到这一点。为此收集训练数据可能会很麻烦,因为您需要根据句子中各自的位置来标记城市/国家。我引用Stack Overflow的答案:

Spacy NER model training includes the extraction of other "implicit" features, such as POS and surrounding words.

当您尝试训练单个单词时,它无法获得足够泛化的特征来检测这些实体。

更简单的解决方法可能如下:

安装geonamescache

pip install geonamescache

然后使用以下代码获取国家和城市列表

import geonamescache

gc = geonamescache.GeonamesCache()

# gets nested dictionary for countries
countries = gc.get_countries()

# gets nested dictionary for cities
cities = gc.get_cities()

文档指出您还可以获得许多其他位置选项。

使用以下函数从嵌套字典中获取具有特定名称的键的所有值(从此 answer 获取)

def gen_dict_extract(var, key):
if isinstance(var, dict):
for k, v in var.items():
if k == key:
yield v
if isinstance(v, (dict, list)):
yield from gen_dict_extract(v, key)
elif isinstance(var, list):
for d in var:
yield from gen_dict_extract(d, key)

分别加载两个城市国家列表。

cities = [*gen_dict_extract(cities, 'name')]
countries = [*gen_dict_extract(countries, 'name')]

然后用下面的代码来区分:

nlp = spacy.load("en_core_web_sm")

doc= nlp('Resilience Engineering Institute, Tempe, AZ, United States; Naval Postgraduate School, Department of Operations Research, Monterey, CA, United States; Arizona State University, School of Sustainable Engineering and the Built Environment, Tempe, AZ, United States; Arizona State University, School for the Future of Innovation in Society, Tempe, AZ, United States')

for ent in doc.ents:
if ent.label_ == 'GPE':
if ent.text in countries:
print(f"Country : {ent.text}")
elif ent.text in cities:
print(f"City : {ent.text}")
else:
print(f"Other GPE : {ent.text}")

输出:

City : Tempe
Other GPE : AZ
Country : United States
Country : United States
City : Tempe
Other GPE : AZ
Country : United States
City : Tempe
Other GPE : AZ
Country : United States

关于python - 区分 Spacy NER 中的国家和城市,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59444065/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com