gpt4 book ai didi

python - 使用pandas进行回归,报错: cannot concatenate 'str' and 'float' objects

转载 作者:太空宇宙 更新时间:2023-11-04 08:49:38 25 4
gpt4 key购买 nike

我一直在根据这个答案 ( Reading csv to array, performing linear regression on array and writing to csv in Python depending on gradient ) 编写代码,以找出哪些日子在早上表现出风速增加。

这是我的数据样本

hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50, ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N, 9,N,100,N, 4,N, 15,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51, 0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N, 9,N,107,N, 11,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52, 0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N, 9,N, 83,N, 13,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,53,2000,03,20,10,53,2000,03,20,00,53, 0.0,N, 25.8,N, 25.9,N, 25.8,N, 21.6,N, 21.8,N, 21.6,N, 19.3,N, 19.6,N, 19.3,N, 67,N, 68,N, 66,N, 9,N, 8,N, 87,N, 14,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,54,2000,03,20,10,54,2000,03,20,00,54, 0.0,N, 25.8,N, 25.8,N, 25.8,N, 21.6,N, 21.6,N, 21.6,N, 19.3,N, 19.3,N, 19.2,N, 67,N, 67,N, 67,N, 8,N, 4,N, 98,N, 23,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,55,2000,03,20,10,55,2000,03,20,00,55, 0.0,N, 25.7,N, 25.8,N, 25.7,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.3,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 4,N, 68,N, 15,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,56,2000,03,20,10,56,2000,03,20,00,56, 0.0,N, 25.9,N, 25.9,N, 25.7,N, 21.7,N, 21.7,N, 21.5,N, 19.4,N, 19.4,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 5,N, 69,N, 16,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,57,2000,03,20,10,57,2000,03,20,00,57, 0.0,N, 26.0,N, 26.0,N, 25.9,N, 21.8,N, 21.8,N, 21.7,N, 19.5,N, 19.5,N, 19.4,N, 67,N, 68,N, 66,N, 9,N, 5,N, 72,N, 10,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,58,2000,03,20,10,58,2000,03,20,00,58, 0.0,N, 26.0,N, 26.1,N, 26.0,N, 21.7,N, 21.8,N, 21.7,N, 19.4,N, 19.5,N, 19.3,N, 66,N, 67,N, 66,N, 8,N, 5,N, 69,N, 13,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#

这是我尝试的代码:

import glob
import pandas as pd
import numpy as np
from datetime import datetime

for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)

col = 'Wind (1 minute) speed in km/h'
mask = pd.notnull(df[col])
df = df.loc[mask]

for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))

然而,这是生产

runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
Traceback (most recent call last):

File "<ipython-input-19-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "X:/python/linearregression.py", line 10, in <module>
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0

TypeError: cannot concatenate 'str' and 'float' objects

如果我尝试将年份值转换为整数 float ,例如int('Year Month Day Hours Minutes in YYYY')int('MM') 它产生错误 ValueError: invalid literal for int() with base 10: 'YYYY 年月日时分'

不过,在 Unutbu 的帮助下,TypeError 问题已得到解决。这会产生以下错误。

runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
Traceback (most recent call last):

File "<ipython-input-24-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)

File "X:/python/linearregression.py", line 17, in <module>
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 570, in average
avg = a.mean(axis)

File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\core\_methods.py", line 72, in _mean
ret = ret / rcount

TypeError: unsupported operand type(s) for /: 'str' and 'int'

最佳答案

错误信息

  File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0

TypeError: cannot concatenate 'str' and 'float' objects

如果 y 是包含字符串的系列,则可以重现:

In [14]: np.asarray(pd.Series(['',1.0])) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects

现在如果你peek at line 550 inside polynomial.py ,您会看到 y 是传递给 np.polyfit 的第二个参数。因此,这强烈表明 morning_data['Wind (1 minute) speed in km/h'] 是一个包含字符串的系列。

您发布的示例数据没有显示字符串,但在 CSV 的某个地方您可能会在该列中找到一个字符串。

现在我们怎样才能找到那个字符串呢?一种方法是将 Series 转换为数值(将无效字符串强制转换为 NaN):

col = 'Wind (1 minute) speed in km/h'
tmp = pd.to_numeric(morning_data[col], errors='coerce')

然后寻找 NaN:

mask = pd.isnull(tmp)
print(morning_data.loc[mask, col])

这将显示 'Wind (1 minute) speed in km/h' 列中无法转换为数字的所有值。

然后您可以考虑如何处理这些有问题的行。如果有只是其中的一部分,您可以手动编辑它们。或者查看 CSV 如何已生成并在源头修复错误。或者,如果你想丢弃这些行,你可以使用

for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)

for col in ['Wind (1 minute) speed in km/h',
'Wind (1 minute) direction in degrees true']:
df[col] = pd.to_numeric(df[col], errors='coerce')
mask = pd.notnull(df[col])
df = df.loc[mask]

for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
if len(morning_data) == 0: continue
gradient, intercept = np.polyfit(morning_data['HH24'], morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))

然后其余代码应该有机会工作。

关于python - 使用pandas进行回归,报错: cannot concatenate 'str' and 'float' objects,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36706334/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com