gpt4 book ai didi

python - Pandas 数据框的线性回归

转载 作者:太空狗 更新时间:2023-10-29 18:17:19 25 4
gpt4 key购买 nike

我在 pandas 中有一个数据框,我正在使用它来生成散点图,并且想为该图包含一条回归线。现在我正在尝试使用 polyfit 来做到这一点。

这是我的代码:

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from numpy import *

table1 = pd.DataFrame.from_csv('upregulated_genes.txt', sep='\t', header=0, index_col=0)
table2 = pd.DataFrame.from_csv('misson_genes.txt', sep='\t', header=0, index_col=0)
table1 = table1.join(table2, how='outer')

table1 = table1.dropna(how='any')
table1 = table1.replace('#DIV/0!', 0)

# scatterplot
plt.scatter(table1['log2 fold change misson'], table1['log2 fold change'])
plt.ylabel('log2 expression fold change')
plt.xlabel('log2 expression fold change Misson et al. 2005')
plt.title('Root Early Upregulated Genes')
plt.axis([0,12,-5,12])

# this is the part I'm unsure about
regres = polyfit(table1['log2 fold change misson'], table1['log2 fold change'], 1)

plt.show()

但是我得到以下错误:

TypeError: cannot concatenate 'str' and 'float' objects

有谁知道我哪里出错了?我也不确定如何将回归线添加到我的图中。对我的代码的任何其他一般性评论也将不胜感激,我仍然是初学者。

最佳答案

而不是替换“#DIV/0!”手动强制数据为数字。这同时做了两件事:它确保结果是数字类型(不是 str),并且它用 NaN 代替任何不能解析为数字的条目。示例:

In [5]: Series([1, 2, 'blah', '#DIV/0!']).convert_objects(convert_numeric=True)
Out[5]:
0 1
1 2
2 NaN
3 NaN
dtype: float64

这应该可以解决您的错误。但是,关于将线拟合到数据的一般主题,我有两种方便的方法,我比 polyfit 更喜欢。两者中的第二个更强大(并且可能会返回有关统计信息的更多详细信息)但它需要 statsmodels。

from scipy.stats import linregress
def fit_line1(x, y):
"""Return slope, intercept of best fit line."""
# Remove entries where either x or y is NaN.
clean_data = pd.concat([x, y], 1).dropna(0) # row-wise
(_, x), (_, y) = clean_data.iteritems()
slope, intercept, r, p, stderr = linregress(x, y)
return slope, intercept # could also return stderr

import statsmodels.api as sm
def fit_line2(x, y):
"""Return slope, intercept of best fit line."""
X = sm.add_constant(x)
model = sm.OLS(y, X, missing='drop') # ignores entires where x or y is NaN
fit = model.fit()
return fit.params[1], fit.params[0] # could also return stderr in each via fit.bse

要绘制它,做类似的事情

m, b = fit_line2(x, y)
N = 100 # could be just 2 if you are only drawing a straight line...
points = np.linspace(x.min(), x.max(), N)
plt.plot(points, m*points + b)

关于python - Pandas 数据框的线性回归,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19379295/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com