python - 使用 Hugging Face Transformers 库你怎么能 POS

python - 使用 Hugging Face Transformers 库你怎么能 POS_TAG 法语文本

转载作者：行者123 更新时间：2023-12-01 21:28:15

35

4

我正在尝试使用 Hugging Face Transformers 库对法语进行 POS_TAG。在英语中，我能够给出一个句子，例如:

The weather is really great. So let us go for a walk.

结果是:

    token   feature
0   The     DET
1   weather NOUN
2   is      AUX
3   really  ADV
4   great   ADJ
5   .       PUNCT
6   So      ADV
7   let     VERB
8   us      PRON
9   go      VERB
10  for     ADP
11  a       DET
12  walk    NOUN
13  .       PUNCT

有没有人知道如何为法语实现类似的事情？

这是我在 Jupyter notebook 中用于英文版的代码:

!git clone https://github.com/bhoov/spacyface.git
!python -m spacy download en_core_web_sm

from transformers import pipeline
import numpy as np
import pandas as pd

nlp = pipeline('feature-extraction')
sequence = "The weather is really great. So let us go for a walk."
result = nlp(sequence)
# Just displays the size of the embeddings. The sequence
# In this case there are 16 tokens and the embedding size is 768
np.array(result).shape

import sys
sys.path.append('spacyface')

from spacyface.aligner import BertAligner

alnr = BertAligner.from_pretrained("bert-base-cased")
tokens = alnr.meta_tokenize(sequence)
token_data = [{'token': tok.token, 'feature': tok.pos} for tok in tokens]
pd.DataFrame(token_data)

这个笔记本的输出如上。

最佳答案

我们最终用 Hugging Face Transformers 训练了词性标注(词性标注)模型图书馆。生成的模型可在此处获得:

https://huggingface.co/gilf/french-postag-model?text=En+Turquie%2C+Recep+Tayyip+Erdogan+ordonne+la+reconversion+de+Sainte-Sophie+en+mosqu%C3%A9e

你基本上可以在上面提到的网页上看到它是如何分配POS标签的。如果您安装了 Hugging Face Transformers 库，您可以使用以下代码在 Jupyter 笔记本中试用它:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("gilf/french-postag-model")
model = AutoModelForTokenClassification.from_pretrained("gilf/french-postag-model")

nlp_token_class = pipeline('ner', model=model, tokenizer=tokenizer, grouped_entities=True)
nlp_token_class('En Turquie, Recep Tayyip Erdogan ordonne la reconversion de Sainte-Sophie en mosquée')

这是控制台上的结果:

[{'entity_group': 'PONCT', 'score': 0.11994100362062454, 'word': '[CLS]'},
{'entity_group': 'P', 'score': 0.9999570250511169, 'word': 'En'}, 
{'entity_group': 'NPP', 'score': 0.9998692870140076, 'word': 'Turquie'},
{'entity_group': 'PONCT', 'score': 0.9999769330024719, 'word': ','},
{'entity_group': 'NPP',   'score': 0.9996993020176888,  'word': 'Recep Tayyip Erdogan'},
{'entity_group': 'V', 'score': 0.9997997283935547, 'word': 'ordonne'},  
{'entity_group': 'DET', 'score': 0.9999586343765259, 'word': 'la'},
{'entity_group': 'NC', 'score': 0.9999251365661621, 'word': 'reconversion'},  
{'entity_group': 'P', 'score': 0.9999709129333496, 'word': 'de'},
{'entity_group': 'NPP', 'score': 0.9985082149505615, 'word': 'Sainte'},  
{'entity_group': 'PONCT', 'score': 0.9999614357948303, 'word': '-'},
{'entity_group': 'NPP', 'score': 0.9461128115653992, 'word': 'Sophie'},
{'entity_group': 'P', 'score': 0.9999079704284668, 'word': 'en'},
{'entity_group': 'NC', 'score': 0.8998225331306458, 'word': 'mosquée [SEP]'}]

关于python - 使用 Hugging Face Transformers 库你怎么能 POS_TAG 法语文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62782001/

35

4

0

文章推荐： substrate - 如何创建一个没有交易费的外部？

文章推荐： r - R中按日期分组的条件总和

文章推荐： python - 如何从 R 中的文本文件中删除行？

android - Pocketsphinx在Android上使用音素识别的识别准确率不佳，法语
我正在从事一个项目，我必须将 Pocketsphinx 的语音功能集成到一个 Android 应用程序中。事实上，我必须集成 Pocketpshinx 提供的音素识别功能，它应该能够识别法语中的音素，
php - 法语、俄语等外语的拼写纠正
我想在 javascript/php 中实现法语、俄语等外语的拼写纠正。对于英语拼写检查器，我可以使用编辑距离算法从英语词典中检索单词(词典是使用 Trie 构建的)并返回出现频率最高的单词。我还找到
JavaScript htmlentities 法语
我有一个 .NET MVC 页面，其中包含每个项目的列表 rel 中的编码描述.我希望能够搜索带有 rel 的所有项目包含我的搜索查询。其中一个字段的值为 htmlentities rel='D&
excel - 日期格式英语/法语
我有这种日期格式: Mon, Nov 19, 2018 我希望它采用法语短日期格式(DD/MM/YYY) 但我无法用excel的基本日期格式解决它。有任何想法吗？最佳答案您可以使用 MATCH
vba - 将月份名称(法语)转换为月份编号
我有一个 Excel，其中每个月都会创建一个名为 kpi_monthname 的工作表，其中月份名称包含该月的前 3 个字符。就像 5 月一样，它显示为 kpi_mai 或 4 月它显示为 kpi_a
haskell ['a' 。 .'z' ] 法语
我想知道，如果这样 alph = ['a'..'z'] 返回我 "abcdefghijklmnopqrstuvwxyz" 那我怎样才能返回法语字母呢？我可以通过某种方式传递语言环境吗？更新:嗯)我知
javascript - 法语 "'“在变量中阻止其呈现
法语含糊的语言问题。我正在建立一个有 5 种语言的网站。我已经使用 PHP 设置了法语等语言文件 $lang['Description'] = 'Photos'; $lang['cookie']= "
javascript - 如何在弹出窗口中选择语言(英语/法语)？
Popup Click me to toggle the popup! A Simple Popup! // When the user clicks on div, open the po
html - 法语 Google 字体在具有相同浏览器的不同设备上不一致
所以我正在为我目前工作的法国客户做一个网站，我正在为网站使用来自 Google Fonts 的名为 Comfortaa 的字体。该网站是法语的，因此使用带有重音符号的字母，例如 é à 和 è 然而，
css - 在语言之间切换时的字体系列(即英语 - 法语)
我在一个网站上实现了一个语言插件，你知道排序点击并将所有内容更改为阿拉伯语、俄语..等(我知道现代浏览器已经为此内置了功能，但我们选择了去这样。) 我一直在徘徊的是，如果我们的正常网站正在运行，我们如
android - 我怎么知道键盘语言？ (英语/法语)
我开发了一个短信发送器应用程序，我想知道用户使用的语言。那么，当用户键入一条消息时，我如何知道他/她使用的语言？最佳答案通过使用获取输入类型管理器: InputMethodManager imm
Java 字符串字符编码 - 法语 - 荷兰语语言环境
我有如下一段代码 public static void main(String[] args) throws UnsupportedEncodingException { System
python - 验证任何语言的日期格式(法语、中文、土耳其语)
我想验证任何给定格式的日期格式。例如。法语:14-déc-2017。在普通英语中，14-Dec-2017 采用 %d-%b-%Y 格式。我想要的是任何语言格式的给定日期都应该得到验证。在Python
python - nltk python 法语 Stemmer
我正在尝试初始化FrenchStemmer: stemmer = nltk.stem.FrenchStemmer('french') 错误是: AttributeError: 'module' obj
java - 法语、西类牙语重音字符在 Excel 中无法正确显示
这个问题已经有答案了: Microsoft Excel mangles Diacritics in .csv files? (22 个回答) 已关闭 3 年前。我正在尝试生成不同语言的报告，例如法语
mysql - 法语/英语和日语的 SQL 条目
我有一个旧的 SQL4 数据库，我正在尝试将其重新上传到我们在 Phpmyadmin 上新创建的数据库。表中的字符是拉丁文和日文。我尝试更改这些特定列，但结果仍然是我需要以日语显示的列的损坏字符。这
powershell - 如何在 PowerShell 中将(法语)完整月份解析为日期时间对象？
我有一个日期字符串，我想将其解析为日期时间对象。我有这个: $invoice = '9 février 2017' [datetime]::parseexact($invoice, 'dd MMMM
asp.net - ASP.NET 页面上的多种语言(英语、法语)
我只是想知道在网页上处理多种语言的最佳方法是什么？我应该在负载中创建一个事件，将所有控件的标签更改为适当的语言文本，还是有更好的方法？我正在使用.NET框架，谢谢。最佳答案对于 ASP.NET，请
asp.net - 以多种语言存储内容？例如。英语、法语、德语
我应该如何在一个供全局使用的网站上存储(和展示)多种语言的文本？内容主要是 500 多字文章的形式，尽管我还需要翻译每一页上的小段文字(例如“打印这篇文章”或“返回菜单”)。我知道有几个 CMS 包
excel - 多语言 Excel VBA 宏(法语/英语/瑞典语)
晚上好! 我在瑞典 Mac 上开发了一个英文宏。该宏在法国使用的带有法语 Excel 的 Mac 上运行。所有的子程序都工作得很好……但只有一个。我使用了公式“=VALUTA(123,4567)”[瑞

首页

博学

6Ren·AI

商城

python - 使用 Hugging Face Transformers 库你怎么能 POS_TAG 法语文本