gpt4 book ai didi

python - 如何从给定索引的 groupby DataFrame 中检索元素

转载 作者:太空宇宙 更新时间:2023-11-04 04:50:46 26 4
gpt4 key购买 nike

上下文:

我的 DataFrame 包含以下列:HapID、Marker、Start_position、End_position。对于每个 HapID,我想得到:- 具有最小 Start_position 的标记(称为 leftMarker)- 具有最大 End_position 的标记(称为 rightMarker)- 间隔是差异(最大 End_position - 最小 Start_position)

我的问题是,既然我知道了它们的索引,如何检索标记名称。我收到以下错误,虽然我已经花了几个小时解决它,但我不确定如何解决它。

这是错误信息

AttributeError: Cannot access callable attribute 'iloc' of 'SeriesGroupBy' objects, try using the 'apply' method

下面是数据

HapID   Marker  Start_position  End_position
hap_1 mk1 1107207 1107256
hap_1 mk2 1104711 1104760
hap_1 mk3 1106845 1106894
hap_2 mk4 11901413 11901462
hap_2 mk5 206031250 206031299
hap_2 mk6 11498893 11498942
hap_2 mk7 17236023 17236072
hap_2 mk8 11692209 11692258
hap_2 mk9 11691512 11691561
hap_2 mk10 11615664 11615713

这是预期的输出

HapID   leftMarker  rightMarker Start_position  End_position    Interval
hap_1 mk2 mk1 1104711 1107256 2545
hap_2 mk6 mk5 11498893 206031299 194532406

代码:

import pandas as pd
data = {
'HapID':['hap_1','hap_1','hap_1','hap_2','hap_2','hap_2','hap_2','hap_2','hap_2','hap_2'],
'Marker':['mk1','mk2','mk3','mk4','mk5','mk6','mk7','mk8','mk9','mk10'],
'Start_position':[1107207,1104711,1106845,11901413,206031250,11498893,17236023,11692209,11691512,11615664],
'End_position':[1107256,1104760,1106894,11901462,206031299,11498942,17236072,11692258,11691561,11615713]}
df = pd.DataFrame(data)

haplotypes = df.groupby(df['HapID'])
posi_1 = haplotypes.Start_position.min()
posi_2 = haplotypes.End_position.max()
diff_posi = posi_2 - posi_1
a = haplotypes.Start_position.idxmin()#index at minimum Start_position
b = haplotypes.End_position.idxmax() #index at maximum End_position
#print('{} {} {}'.format(posi_1,posi_2,diff_posi))
#print('{} {}'.format(a,b)) #just to se if I'm getting the index

现在,问题是如何为每个单倍型检索那些位置的标记

leftMarker = haplotypes.Marker.iloc(a)
rightMarker = haplotypes.Marker.iloc(b)

最佳答案

我认为您需要从原始数据框中检索标记。

leftMarker = df.loc[a,['HapID','Marker']]
rigthMarker = df.loc[b,['HapID','Marker']]

print(leftMarker)

HapID Marker
1 hap_1 mk2
5 hap_2 mk6

print(rightMarker)

HapID Marker
0 hap_1 mk1
4 hap_2 mk5

关于python - 如何从给定索引的 groupby DataFrame 中检索元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48387268/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com