gpt4 book ai didi

python - 使用 rpy2 来自 pandas DataFrame 的分位数回归模型中的非一致性数组

转载 作者:行者123 更新时间:2023-12-01 04:13:44 28 4
gpt4 key购买 nike

我正在使用 rpy2 (2.7.6) 对 engel 数据集进行分位数回归:

import statsmodels as sm

from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri

pandas2ri.activate()

quantreg = importr('quantreg')

data = sm.datasets.engel.load_pandas().data

qreg = quantreg.rq('foodexp ~ income', data=data, tau=0.5)

但是这会产生以下错误:

qreg = quantreg.rq('foodexp ~ income', data=data, tau=0.5)
Traceback (most recent call last):

File "<ipython-input-22-02ee1015737c>", line 1, in <module>
quantreg.rq('foodexp ~ income', data=data, tau=0.5)

File "C:\Anaconda\lib\site-packages\rpy2\robjects\functions.py", line 178, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)

File "C:\Anaconda\lib\site-packages\rpy2\robjects\functions.py", line 106, in __call__
res = super(Function, self).__call__(*new_args, **new_kwargs)

RRuntimeError: Error in y - x %*% z$coef : non-conformable arrays

据我了解,在这种情况下,不一致的数组意味着存在一些缺失值或正在使用的“数组”大小不同。我可以确认事实并非如此:

data.count()
Out[26]:
income 235
foodexp 235
dtype: int64

data.shape
Out[27]: (235, 2)

此错误还意味着什么? rpy2 中从 DataFrame 到 data.frame 的转换是否可能无法正常工作,或者我在这里遗漏了一些东西?其他人可以确认此错误吗?

以防万一,这里有一些有关 R 和 Python 版本的信息。

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

Python 2.7.11 |Anaconda 2.3.0 (64-bit)| (default, Dec 7 2015, 14:10:42) [MSC v.1500 64 bit (AMD64)]
on win32

如有任何帮助,我们将不胜感激。

编辑 1:

如果我直接从 R 加载数据集,我不会收到错误:

from rpy2.robjects import r

r.data('engel')
data = r['engel']

qreg = quantreg.rq('foodexp ~ income', data=data, tau=0.5)

所以我认为 pandas2ri 的转换有问题。当我尝试使用 pandas2ri.py2ri 手动将 DataFrame 转换为 data.frame 时,会发生同样的错误。

编辑2:

有趣的是,如果我使用已弃用的 pandas.rpy.common.convert_to_r_dataframe 错误就会消失:

import pandas.rpy.common as com

rdata = com.convert_to_r_dataframe(data)

qreg = quantreg.rq('foodexp ~ income', data=rdata, tau=0.5)

pandas2ri 中肯定存在一个错误,该错误也已得到确认 here .

最佳答案

正如 rpy2 issue tracker 上的回答:

问题的根源似乎是 pandas 数据框中的列被转换为数组对象,每个对象只有一列。

>>> pandas2ri.py2ri_pandasdataframe(data) 
<DataFrame - Python:0x7f8af3c2afc8 / R:0x92958b0>
[Array, Array]
income: <class 'rpy2.robjects.vectors.Array'>
<Array - Python:0x7f8af57ef908 / R:0x92e1bf0>
[420.157651, 541.411707, 901.157457, ..., 581.359892, 743.077243, 1057.676711]
foodexp: <class 'rpy2.robjects.vectors.Array'>
<Array - Python:0x7f8af3c2ab88 / R:0x92e7600>
[255.839425, 310.958667, 485.680014, ..., 468.000798, 522.601906, 750.320163]

这种区别是一种微妙的区别,但这似乎使 quantreg 包感到困惑。还有其他 R 函数似乎独立工作,无论对象是具有一列的数组还是向量。

将列转换为 R 向量似乎是解决问题所需的:

from rpy2.robjects.vectors import FloatVector
mydata=pandas2ri.py2ri_pandasdataframe(data)

from rpy2.robjects.packages import importr
base=importr('base')
mydata[0]=base.as_vector(mydata[0])
mydata[1]=base.as_vector(mydata[1])
# now this is working
qreg = quantreg.rq('foodexp ~ income', data=mydata, tau=0.5)

现在我想收集更多数据来了解这是否可以在不破坏其他内容的情况下解决问题。为此,我将修复程序转变为从 pandas 转换器派生的自定义转换器:

from rpy2.robjects import default_converter
from rpy2.robjects.conversion import Converter, localconverter

from rpy2.robjects.packages import importr
from rpy2.robjects import numpy2ri, pandas2ri, vectors
import numpy

my_converter = Converter('my converter',
template=pandas2ri.converter)
base=importr('base')

def ndarray_forcevector(obj):
func=numpy2ri.converter.py2ri.registry[numpy.ndarray]
# current conversion as performed by numpy
res=func(obj)
if len(obj.shape) == 1:
# force into an R vector
res=base.as_vector(res)
return res

@my_converter.py2ri.register(pandas2ri.PandasSeries)
def py2ri_pandasseries(obj):
# this is a copy of the function with the same name in pandas2ri, with
# the call to ndarray_forcevector() as the only difference
if obj.dtype == '<M8[ns]':
# time series
d = [vectors.IntVector([x.year for x in obj]),
vectors.IntVector([x.month for x in obj]),
vectors.IntVector([x.day for x in obj]),
vectors.IntVector([x.hour for x in obj]),
vectors.IntVector([x.minute for x in obj]),
vectors.IntVector([x.second for x in obj])]
res = vectors.ISOdatetime(*d)
#FIXME: can the POSIXct be created from the POSIXct constructor ?
# (is '<M8[ns]' mapping to Python datetime.datetime ?)
res = vectors.POSIXct(res)
else:
# converted as a numpy array
res = ndarray_forcevector(obj)
# "index" is equivalent to "names" in R
if obj.ndim == 1:
res.do_slot_assign('names',
vectors.StrVector(tuple(str(x) for x in obj.index)))
else:
res.do_slot_assign('dimnames',
vectors.SexpVector(conversion.py2ri(obj.index)))
return res

使用这个新转换器的最简单方法可能是在上下文管理器中:

with localconverter(default_converter + my_converter) as cv:
qreg = quantreg.rq('foodexp ~ income', data=data, tau=0.5)

关于python - 使用 rpy2 来自 pandas DataFrame 的分位数回归模型中的非一致性数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34560999/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com