gpt4 book ai didi

python - 给定数据框中的 pd.Interval 列,过滤落在 Interval 范围内的值

转载 作者:太空宇宙 更新时间:2023-11-03 23:55:54 27 4
gpt4 key购买 nike

我想为数据帧中的每一行分配一个间隔,这样所有行就不会重叠并覆盖整个可能的范围。因此,我可以根据给定间隔内的值过滤行。

我使用过 pd.Interval,但是当我尝试“正常”过滤时它不起作用:

df = pd.DataFrame({"rating":["bad","average","good"],
"stars":[pd.Interval(left=0,right=2,closed="left"),
pd.Interval(left=2,right=4,closed="left"),
pd.Interval(left=4,right=5,closed="both")]})
stars_val=2.5
filtered_df = df[stars_val in df.stars]

它给出了以下错误:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2656 try:
-> 2657 return self._engine.get_loc(key)
2658 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

一个工作代码应该给出结果:

    rating   stars
1 average [2, 4)

最佳答案

如果您的所有区间都具有相同的closed,那么您的列将由IntervalArray 支持你可以使用 IntervalArray.contains矢量化实现的方法:

In [2]: np.random.seed(123)

In [3]: start = np.random.randint(100, size=1000)

In [4]: ia = pd.arrays.IntervalArray.from_arrays(start, start + 5)

In [5]: df = pd.DataFrame({'A': list('abcde') * 200, 'B': ia})

In [6]: df.head()
Out[6]:
A B
0 a (66, 71]
1 b (92, 97]
2 c (98, 103]
3 d (17, 22]
4 e (83, 88]

In [7]: df[df['B'].array.contains(70)]
Out[7]:
A B
0 a (66, 71]
20 a (68, 73]
23 d (67, 72]
27 c (66, 71]
45 a (69, 74]
87 c (67, 72]
111 b (65, 70]
128 d (68, 73]
133 d (65, 70]
135 a (67, 72]
155 a (65, 70]
177 c (69, 74]
193 d (67, 72]
217 c (69, 74]
221 b (66, 71]
223 d (69, 74]
227 c (66, 71]
243 d (66, 71]
250 a (67, 72]
251 b (65, 70]
263 d (68, 73]
407 c (65, 70]
419 e (69, 74]
425 a (65, 70]
446 b (69, 74]
449 e (69, 74]
451 b (66, 71]
523 d (66, 71]
552 c (68, 73]
589 e (66, 71]
609 e (69, 74]
613 d (68, 73]
627 c (69, 74]
637 c (68, 73]
650 a (67, 72]
674 e (69, 74]
711 b (69, 74]
769 e (67, 72]
777 c (69, 74]
800 a (66, 71]
803 d (68, 73]
818 d (69, 74]
822 c (67, 72]
883 d (66, 71]
889 e (68, 73]
944 e (67, 72]
953 d (69, 74]
966 b (65, 70]

混合 closed 间隔会产生一个 object 数组,因此您需要使用效率较低的实现方式,就像 @ALollz 建议的那样。

关于python - 给定数据框中的 pd.Interval 列,过滤落在 Interval 范围内的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57719364/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com