gpt4 book ai didi

python-hypothesis - 如何设置假设中数据帧的最小和最大长度?

转载 作者:行者123 更新时间:2023-12-04 01:56:06 25 4
gpt4 key购买 nike

我有以下使用基因组数据创建数据框的策略:

from hypothesis.extra.pandas import columns, data_frames, column
import hypothesis.strategies as st


def mysort(tp):

key = [-1, tp[1], tp[2], int(1e10)]

return [x for _, x in sorted(zip(key, tp))]

positions = st.integers(min_value=0, max_value=int(1e7))
strands = st.sampled_from("+ -".split())
chromosomes = st.sampled_from(elements=["chr{}".format(str(e)) for e in list(range(1, 23)) + "X Y M".split()])

genomics_data = data_frames(columns=columns(["Chromosome", "Start", "End", "Strand"], dtype=int),
rows=st.tuples(chromosomes, positions, positions, strands).map(mysort))

我对空数据帧不感兴趣,因为它们是无效的,而且我还想生成一些非常长的 df。如何更改为测试用例创建的数据框的大小? IE。最小尺寸 1,平均尺寸大?

最佳答案

您可以为 data_frames 构造函数提供一个具有 min_size 和 max_size 选项的索引参数:

from hypothesis.extra.pandas import data_frames, columns, range_indexes
import hypothesis.strategies as st

def mysort(tp):

key = [-1, tp[1], tp[2], int(1e10)]

return [x for _, x in sorted(zip(key, tp))]

chromosomes = st.sampled_from(["chr{}".format(str(e)) for e in list(range(1, 23)) + "X Y M".split()])

positions = st.integers(min_value=0, max_value=int(1e7))
strands = st.sampled_from("+ -".split())
dfs = data_frames(index=range_indexes(min_size=5), columns=columns("Chromosome Start End Strand".split(), dtype=int), rows=st.tuples(chromosomes, positions, positions, strands).map(mysort))

生成 dfs 如下:

  Chromosome    Start      End Strand
0 chr11 1411202 8025685 +
1 chr18 902289 5026205 -
2 chr12 5343877 9282475 +
3 chr16 2279196 8294893 -
4 chr14 1365623 6192931 -
5 chr12 4602782 9424442 +
6 chr10 136262 1739408 +
7 chr15 521644 4861939 +

关于python-hypothesis - 如何设置假设中数据帧的最小和最大长度?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50623734/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com