gpt4 book ai didi

python - 如何更改我的代码以使字符串不更改为 float

转载 作者:行者123 更新时间:2023-12-01 06:44:55 25 4
gpt4 key购买 nike

我正在尝试编写一个检测假新闻的代码。不幸的是,我不断收到相同的错误消息。请有人解释一下我哪里出了问题?我从 https://data-flair.training/blogs/advanced-python-project-detecting-fake-news/ 得到了一些代码行以及 https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk 中的一些代码行。当我尝试组合两个不同的代码(通过删除重复的代码)时,我收到一条错误消息。

代码

%matplotlib inline
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import itertools
import json
import csv
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

df = pd.read_csv(r"C:\Users\johnrambo\Downloads\fake_news(1).csv", sep=',', header=0, engine='python', escapechar='\\')

X_train, X_test, y_train, y_test = train_test_split(df['headline'], is_sarcastic_1, test_size = 0.2, random_state = 7)

clf = MultinomialNB().fit(X_train, y_train)

predicted = clf.predict(X_test)

print("MultinomialNB Accuracy:", metrics.accuracy_score(y_test, predicted))
<小时/>

错误

ValueError                                Traceback (most recent call last)
<ipython-input-8-e1f11a702626> in <module>
21 X_train, X_test, y_train, y_test = train_test_split(df['headline'], is_sarcastic_1, test_size = 0.2, random_state = 7)
22
---> 23 clf = MultinomialNB().fit(X_train, y_train)
24
25 predicted = clf.predict(X_test)

~\Anaconda\lib\site-packages\sklearn\naive_bayes.py in fit(self, X, y, sample_weight)
586 self : object
587 """
--> 588 X, y = check_X_y(X, y, 'csr')
589 _, n_features = X.shape
590

~\Anaconda\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
717 ensure_min_features=ensure_min_features,
718 warn_on_dtype=warn_on_dtype,
--> 719 estimator=estimator)
720 if multi_output:
721 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

~\Anaconda\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
494 try:
495 warnings.simplefilter('error', ComplexWarning)
--> 496 array = np.asarray(array, dtype=dtype, order=order)
497 except ComplexWarning:
498 raise ValueError("Complex data not supported\n"

~\Anaconda\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540

~\Anaconda\lib\site-packages\pandas\core\series.py in __array__(self, dtype)
946 warnings.warn(msg, FutureWarning, stacklevel=3)
947 dtype = "M8[ns]"
--> 948 return np.asarray(self.array, dtype)
949
950 # ----------------------------------------------------------------------

~\Anaconda\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540

~\Anaconda\lib\site-packages\pandas\core\arrays\numpy_.py in __array__(self, dtype)
164
165 def __array__(self, dtype=None):
--> 166 return np.asarray(self._ndarray, dtype=dtype)
167
168 _HANDLED_TYPES = (np.ndarray, numbers.Number)

~\Anaconda\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
536
537 """
--> 538 return array(a, dtype, copy=False, order=order)
539
540

ValueError: could not convert string to float: 'experts caution new car loses 90% of value as soon as you drive it off cliff'
<小时/>

前几行数据

Excel file: fake news

这是我输入 df.head().to_dict() 时得到的结果:

{'is_sarcastic': {0: 1, 1: 0, 2: 0, 3: 1, 4: 1}, 'headline': {0: '三十多岁的科学家揭开了脱发的末日时钟', 1:他们的代表。完全阐明了为什么国会在性别、种族平等方面未能做到这一点”, 2:“吃你的蔬菜:9种美味不同的食谱”, 3:“恶劣的天气使骗子无法上类”, 4:“妈妈已经非常接近正确使用‘流媒体’这个词了”}, 'article_link': {0: ' https://www.theonion.com/thirtysomething-scientists-unveil-doomsday-clock-of-hai-1819586205 ', 1:'https://www.huffingtonpost.com/entry/donna-edwards-inequality_us_57455f7fe4b055bb1170b207 ', 2:'https://www.huffingtonpost.com/entry/eat-your-veggies-9-delici_b_8899742.html ', 3:'https://local.theonion.com/inclement-weather-prevents-liar-from-getting-to-work-1819576031 ', 4:'https://www.theonion.com/mother-comes-pretty-close-to-using-word-streaming-cor-1819575546 '}}

最佳答案

我想您的 df['headline'] 列中有文本数据,您需要执行几个步骤,首先将文本数据转换为基于数字的格式,然后将其传递给机器学习模型处理。

您可能需要引用 sklearn 的 CountVectorizerTfidfTransformer here

关于python - 如何更改我的代码以使字符串不更改为 float ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59273860/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com