gpt4 book ai didi

python - 将 qcut 分配为新列

转载 作者:太空狗 更新时间:2023-10-30 01:29:40 25 4
gpt4 key购买 nike

在此处的 Pandas 笔记本上

http://nbviewer.ipython.org/urls/raw.github.com/carljv/Will_it_Python/master/ARM/ch5/arsenic_wells_switching.ipynb

我看到 qcut 的结果被分配为 DataFrame 的新列。 Dataframe 有两列,但以某种方式将 qcut 输出分配给新列会神奇地找到“var”变量所在的正确级别——另一个变量未被检查。这里的 Pandas 语义是什么?示例输出如下:

In [2]:
from pandas import *
from statsmodels.formula.api import logit
from statsmodels.nonparametric import KDE
from patsy import dmatrix, dmatrices

In [3]:
df = read_csv('wells.dat', sep = ' ', header = 0, index_col = 0)
print df.head()
switch arsenic dist assoc educ
1 1 2.36 16.826000 0 0
2 1 0.71 47.321999 0 0
3 0 2.07 20.966999 0 10
4 1 1.15 21.486000 0 12
5 1 1.10 40.874001 1 14


In [4]:
model_form = ('switch ~ center(I(dist / 100.)) + center(arsenic) + ' +
'center(I(educ / 4.)) + ' +
'center(I(dist / 100.)) : center(arsenic) + ' +
'center(I(dist / 100.)) : center(I(educ / 4.)) + ' +
'center(arsenic) : center(I(educ / 4.))'
)
model4 = logit(model_form, df = df).fit()

In [20]:
resid_df = DataFrame({'var': df['arsenic'], 'resid': model4.resid})
resid_df[:10]
Out [20]:
resid var
1 0.842596 2.36
2 1.281417 0.71
3 -1.613751 2.07
4 0.996195 1.15
5 1.005102 1.10
6 0.592056 3.90
7 0.941372 2.97
8 0.640139 3.24
9 0.886626 3.28
10 1.130149 2.52

In [15]:
qcut(df['arsenic'], 40)
Out [15]:
Categorical: arsenic
array([(2.327, 2.47], (0.68, 0.71], (1.953, 2.07], ..., [0.51, 0.53],
(0.62, 0.64], (0.64, 0.68]], dtype=object)
Levels (40): Index([[0.51, 0.53], (0.53, 0.56], (0.56, 0.59],
(0.59, 0.62], (0.62, 0.64], (0.64, 0.68],
(0.68, 0.71], (0.71, 0.75], (0.75, 0.78],
(0.78, 0.82], (0.82, 0.86], (0.86, 0.9], (0.9, 0.95],
(0.95, 1.0065], (1.0065, 1.0513], (1.0513, 1.1],
(1.1, 1.15], (1.15, 1.2], (1.2, 1.25], (1.25, 1.3],
(1.3, 1.36], (1.36, 1.42], (1.42, 1.49],
(1.49, 1.57], (1.57, 1.66], (1.66, 1.76],
(1.76, 1.858], (1.858, 1.953], (1.953, 2.07],
(2.07, 2.2], (2.2, 2.327], (2.327, 2.47],
(2.47, 2.61], (2.61, 2.81], (2.81, 2.98],
(2.98, 3.21], (3.21, 3.42], (3.42, 3.791],
(3.791, 4.475], (4.475, 9.65]], dtype=object)

In [17]:
resid_df['bins'] = qcut(df['arsenic'], 40)
resid_df[:20]
Out [17]:
resid var bins
1 0.842596 2.36 (2.327, 2.47]
2 1.281417 0.71 (0.68, 0.71]
3 -1.613751 2.07 (1.953, 2.07]
4 0.996195 1.15 (1.1, 1.15]
5 1.005102 1.10 (1.0513, 1.1]
6 0.592056 3.90 (3.791, 4.475]
7 0.941372 2.97 (2.81, 2.98]
8 0.640139 3.24 (3.21, 3.42]

为“var”找到了正确的bin,赋值没注意“resid”。

最佳答案

我发现做问题的标题描述的唯一一般方法是:

quartiles = pd.qcut(df['ValToRank'], 4, labels=range(1,5))
df = df.assign(Quartile=quartiles.values)

这会将四分位数排名值分配为新的 DataFramedf['Quartile']

A solution for a more generalized case, in which one wants to partition the cut by multiple columns, is given here .

关于python - 将 qcut 分配为新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14959722/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com