gpt4 book ai didi

python-3.x - 如何在 TfidfVectorizer.fit_transform() 中传递用户定义的函数

转载 作者:行者123 更新时间:2023-12-03 17:13:25 24 4
gpt4 key购买 nike

我有文本预处理功能,它只是删除停用词:

def text_preprocessing():
df['text'] = df['text'].apply(word_tokenize)
df['text']=df['text'].apply(lambda x: [item for item in x if item not in stopwords])
new_array=[]
for keywords in df['text']: #converts list of words into string
P=" ".join(str(x) for x in keywords)
new_array.append(P)
df['text'] = new_array
return df['text']

我想将 text_preprocessing() 传递到另一个函数 tf_idf() 中,它给出了我基本上所做的特征矩阵:-

def tf_idf():
tfidf = TfidfVectorizer()
feature_array = tfidf.fit_transform(text_preprocessing)
keywords_data=pd.DataFrame(feature_array.toarray(), columns=tfidf.get_feature_names())
return keywords_data

我得到一个错误,因为 TypeError: 'function' object is not iterable

最佳答案

无需构建额外的停用词删除功能,您只需将自定义的停用词列表传递给 TfidfVectorizer。正如您在下面的示例中看到的,“test”已成功从 Tfidf 词汇表中排除。

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

# Setting up
numbers = np.random.randint(1, 5, 3)
text = ['This is a test.', 'Is this working?', "Let's see."]
df = pd.DataFrame({'text': text, 'numbers': numbers})

# Define custom stop words and instantiate TfidfVectorizer with them
my_stopwords = ['test'] # the list can be longer
tfidf = TfidfVectorizer(stop_words=my_stopwords)
text_tfidf = tfidf.fit_transform(df['text'])

# Optional - concatenating tfidf with df
df_tfidf = pd.DataFrame(text_tfidf.toarray(), columns=tfidf.get_feature_names())
df = pd.concat([df, df_tfidf], axis=1)

# Initial df
df
Out[133]:
numbers text
0 2 This is a test.
1 4 Is this working?
2 3 Let's see.

tfidf.vocabulary_
Out[134]: {'this': 3, 'is': 0, 'working': 4, 'let': 1, 'see': 2}

# Final df
df
Out[136]:
numbers text is let see this working
0 2 This is a test. 0.707107 0.000000 0.000000 0.707107 0.000000
1 4 Is this working? 0.517856 0.000000 0.000000 0.517856 0.680919
2 3 Let's see. 0.000000 0.707107 0.707107 0.000000 0.000000

关于python-3.x - 如何在 TfidfVectorizer.fit_transform() 中传递用户定义的函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51349829/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com