gpt4 book ai didi

python - 数字错误 "sequence item 0: expected str instance, type found"

转载 作者:行者123 更新时间:2023-11-28 17:29:25 26 4
gpt4 key购买 nike

我想在多元回归分析中选择变量。我尝试使用此代码 http://planspace.org/20150423-forward_selection_with_statsmodels/ .问题是我想从 50 个变量中选择,这需要太多时间。我使用 Numba 使其更快并编写了以下代码:

@jit
def forward_selected(data, response):
"""Linear model designed by forward selection.

Parameters:
-----------
data : pandas DataFrame with all possible predictors and response

response: string, name of response column in data

Returns:
--------
model: an "optimal" fitted statsmodels linear model
with an intercept
selected by forward selection
evaluated by adjusted R-squared
"""
remaining = set(data.columns)
remaining.remove(response)
selected = [str]
current_score, best_new_score = 0.0, 0.0
while remaining and current_score == best_new_score:
scores_with_candidates = [str]
for candidate in remaining:
formula = "{} ~ {} + 1".format(response,
' + '.join(selected + [candidate]))
score = smf.ols(formula, data).fit().rsquared_adj
scores_with_candidates.append((score, candidate))
scores_with_candidates.sort()
best_new_score, best_candidate = scores_with_candidates.pop()
if current_score < best_new_score:
remaining.remove(best_candidate)
selected.append(best_candidate)
current_score = best_new_score
formula = "{} ~ {} + 1".format(response,
' + '.join(selected))
model = smf.ols(formula, data).fit()
return model

model = forward_selected(df, col)

但它返回以下错误:

TypeError: sequence item 0: expected str instance, type found

请告诉我如何修复它。如果您不明白我的问题,我很乐意在评论中提供更多信息。

Traceback (most recent call last):

File "~/PycharmProjects/anacondaenv/touhu_1.py", line 164, in

submit = forecast(col)

File "~/PycharmProjects/anacondaenv/touhu_1.py", line 75, in forecast

model = forward_selected(df, col)TypeError: sequence item 0: expected str instance, type found

最佳答案

我认为查看 numba 是否真的起到助推器作用的最佳方法之一是尝试使用 njit 而不是 jit 装饰器。 njit 强制 no-python-mode 并在有任何退回到 python 时中断(它根本没有速度优势)。简短回答:不要使用除 np.ndarrays 之外的任何东西。因此,没有字符串、没有元组、没有列表,也没有调用未编译的函数。

所以我修复了错误:numba 不允许在主函数主体中使用空列表...不知道为什么(也许是错误?!)但是如果你将它移动到 while 阻止。

import statsmodels.formula.api as smf
import numba as nb

@nb.jit
def forward_selected_nojit(data, response):
"""Linear model designed by forward selection.

Parameters:
-----------
data : pandas DataFrame with all possible predictors and response

response: string, name of response column in data

Returns:
--------
model: an "optimal" fitted statsmodels linear model
with an intercept
selected by forward selection
evaluated by adjusted R-squared
"""
remaining = set(data.columns)
remaining.remove(response)
selected = None # Changed this line
current_score, best_new_score = 0.0, 0.0
while remaining and current_score == best_new_score:
if selected is None: # Changed this and next line
selected = []
scores_with_candidates = []
for candidate in remaining:
formula = "{} ~ {} + 1".format(response,
' + '.join(selected + [candidate]))
score = smf.ols(formula, data).fit().rsquared_adj
scores_with_candidates.append((score, candidate))
scores_with_candidates.sort()
best_new_score, best_candidate = scores_with_candidates.pop()
if current_score < best_new_score:
remaining.remove(best_candidate)
selected.append(best_candidate)
current_score = best_new_score
formula = "{} ~ {} + 1".format(response,
' + '.join(selected))
model = smf.ols(formula, data).fit()
return model

这可能可以用更好的方式解决,但这里重要的是时间安排。但首先检查 numba 是否会产生任何奇怪的东西:

# With numba
sl ~ rk + yr + 1
0.835190760538

# Without numba
sl ~ rk + yr + 1
0.835190760538

所以结果是一样的,现在让我们看看它们的表现如何:

# with numba
10 loops, best of 3: 264 ms per loop

# without numba
10 loops, best of 3: 252 ms per loop

所以这完全符合我的预期。使用 python 类型并调用未编译的外部函数,您不会获得任何速度增益。您可能可以使用 numba 使其更快,但请务必阅读 numba 文档并查看支持的内容:Python typesNumpy Types

关于python - 数字错误 "sequence item 0: expected str instance, type found",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35605333/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com