gpt4 book ai didi

python - Pandas.mean() 类型错误 : Could not convert to numeric

转载 作者:行者123 更新时间:2023-12-04 13:47:16 24 4
gpt4 key购买 nike

我正在开发一个项目,我将数据从 SQL 导入到 Pandas DataFrame 中。这似乎很顺利,但是当我使用 pandas.mean() 时,它会抛出一个 TypeError 说值的串联列表无法转换为数字(见下文):

示例数据框:

df =
ProductSKU OverallHeight-ToptoBottom
0 AAI2185 74.5
1 AAI2275 47
2 AAI2686 56.5
3 AASA1002 73.23
4 AASA1032 39.37
5 AASA1039 72.44
6 AASA1099 75.6
7 AASA1101 38
8 ABCM1910 69
9 ABCM1980 72

函数调用:
def summarizeTagData(df, tag):
avgValue = df.loc[:,tag].dropna().mean() <--- Breaks here
stdevValue = df.loc[:,tag].dropna().std()
lowerBound = max(avgValue-(3*stdevValue),0)
upperBound = (avgValue+(3*stdevValue))
outsideRangeCount = df[df[tag]>upperBound].shape[0]
missingDataCount = df[df[tag].isnull()].shape[0]
dataDict = {"Average":avgValue
, "StDev":stdevValue
, "UpperBound":upperBound
, "LowerBound":lowerBound
, "OutsideRange":outsideRangeCount
, "MissingData":missingDataCount
}
return dataDict

控制台输出:
summarizeTagData(df, 'OverallHeight-ToptoBottom')
Traceback (most recent call last):

File "<ipython-input-22-f1f26a0a0520>", line 1, in <module>
summarizeTagData(df, 'OverallHeight-ToptoBottom')

File "C:/Users/tmori/Google Drive/Projects/Product Dimension Accuracy/ProductDataTag_Analysis.py", line 23, in summarizeTagData
avgValue = df.loc[:,tag].dropna().mean()

File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 5310, in stat_func
numeric_only=numeric_only)

...

File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 293, in nanmean
the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))

File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 743, in _ensure_numeric
raise TypeError('Could not convert %s to numeric' % str(x))

TypeError: Could not convert 74.54756.573.2339.3772.4475.6386972 to numeric

最奇怪的事情(也是我无法弄清楚的)是,当我通过 CSV 导入相同的数据时,它工作得非常好。只有当我通过 SQL 加载它时它才会中断,我在那里做错了吗?

最好的事物,
汤姆

最佳答案

正如控制台输出显示的那样,数据框列 'OverallHeight-ToptoBottom' 存在问题。 .
我的猜测,作为@Warren Weckesser commented , 是列包含字符串。为了检查该列的数据类型运行

print(df['OverallHeight-ToptoBottom'].dtype) 
假设上述情况属实,将列数据类型转换为float应该可以解决问题。用于该用途 pandas.to_numeric
df["OverallHeight-ToptoBottom"] = pd.to_numeric(df["OverallHeight-ToptoBottom"], downcast="float")

关于python - Pandas.mean() 类型错误 : Could not convert to numeric,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44522741/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com