gpt4 book ai didi

python - 如何从 nGram 列表中加载计数向量化器?

转载 作者:行者123 更新时间:2023-11-30 10:00:17 25 4
gpt4 key购买 nike

我犯了一个愚蠢的错误,没有腌制我的计数矢量化器,而是我有一个它生成的所有 nGram 的列表,比如 3500 个特征。

现在我的问题是我需要从 nGram 列表中加载 countVectorizer 模型,无论如何我可以做到这一点吗?目前该列表位于 pd.dataframe 中。

我希望我能做类似的事情

CV = CountVectorizer("loadMyListofnGrams")

任何帮助将不胜感激!

最佳答案

您可以通过使用 n-gram 列表训练 CountVectorizer 来实现此目的。

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

ngrams = ['coffee', 'darkly', 'darkly colored', 'bitter', 'stimulating',
'drinks', 'stimulating drinks']

new_docs = [
'Coffee is darkly colored, bitter, slightly acidic and \
has a stimulating effect in humans, primarily due to its \
caffeine content.[3] ',
'It is one of the most popular drinks \
in the world,[4] and it can be prepared and presented in a \
variety of ways (e.g., espresso, French press, caffè latte). '
]

# Instantiate CountVectorizer and train it with your ngrams
cv = CountVectorizer(ngram_range=(1, 2))
cv.fit(ngrams)
cv.vocabulary_

# Apply the vectorizer to new documents and display the dense matrix
counts = cv.transform(new_docs)
counts.A

# Turn the results into a data frame
counts_df = pd.DataFrame(counts.A, columns=cv.get_feature_names())
counts_df

输出

cv.vocabulary_
Out[10]:
{'coffee': 1,
'darkly': 3,
'colored': 2,
'darkly colored': 4,
'bitter': 0,
'stimulating': 6,
'drinks': 5,
'stimulating drinks': 7}

counts_df
Out[12]:
bitter coffee colored darkly darkly colored drinks stimulating \
0 1 1 1 1 1 0 1
1 0 0 0 0 0 1 0

stimulating drinks
0 0
1 0

关于python - 如何从 nGram 列表中加载计数向量化器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59282495/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com