gpt4 book ai didi

python - 如何计算Python中列表成对比较的元素频率?

转载 作者:行者123 更新时间:2023-12-01 03:38:17 24 4
gpt4 key购买 nike

我将示例存储在以下列表中

 sample = [AAAA,CGCG,TTTT,AT-T,CATC]

..为了说明问题,我在下面将它们表示为“集合”

Set1 AAAA
Set2 CGCG
Set3 TTTT
Set4 AT-T
Set5 CATC
  1. 消除所有集合中的每个元素都与其自身相同的集合。

输出:

 Set2 CGCG
Set4 AT-T
Set5 CATC
  • 在集合之间执行成对比较。 (组 2 对组 4,组 2 对组 5,组 4 对组 5)

  • 每个成对比较只能有两种类型的组合,如果没有,则消除这些成对比较。例如,

    Set2    Set5
    C C
    G A
    C T
    G C
  • 这里,有两种以上的对(CC)、(GA)、(CT)和(GC)。所以这种成对比较不可能发生。

    每次比较只能有 2 种组合 (AA, GG,CC,TT, AT,TA,AC,CA,AG,GA,GC,CG,GT,TG,CT,TC) ... 基本上全部当顺序很重要时,ACGT 的可能组合。

    在给定的示例中,找到了 2 个以上的此类组合。

    因此,Set2 和 Set4; Set4 和 Set5 不能被考虑。因此,剩下的唯一对是:

    Output
    Set2 CGCG
    Set4 AT-T
  • 在此成对比较中,删除任何带有“-”的元素及其在另一对中的相应元素

    Output    
    Set2 CGG
    Set4 ATT
  • 计算 Set2 和 Set4 中元素的频率。计算集合中配对类型的出现频率(CA 和 GT 对)

    Output
    Set2 (C = 1/3, G = 2/3)
    Set4 (A = 1/3, T = 2/3)
    Pairs (CA = 1/3, GT = 2/3)
  • 计算相应元素的 float(a) = (Pairs) - (Set2) * (Set4)(任意一对即可)

    eg. For CA pairs, float (a) = (freq of CA pairs) - (freq of C) * (freq of A)
  • 注意:如果配对是 AAAC 和 CCCA,则 C 的频率将为 1/4,即其中一对碱基的频率

  • 计算

    float (b) = float(a)/ (freq of C in CGG) * (freq G in CGG) * (freq A in ATT) * (ATT==> freq of T in ATT)
  • 对所有成对比较重复此操作

  • 例如。

    Set2 CGCG
    Set4 AT-T
    Set6 GCGC

    组2对组4,组2对组6,组4对组6

    到目前为止我的半生不熟的代码:** 如果建议的所有代码都采用标准 for 循环格式而不是推导式,我会更喜欢 **

    #Step 1
    for i in sample:
    for j in range(i):
    if j = j+1 #This needs to be corrected to if all elements in i identical to each other i.e. if all "j's" are the same
    del i
    #insert line of code where sample1 = new sample with deletions as above

    #Step 2
    for i,i+1 in enumerate(sample):
    #Step 3
    for j in range(i):
    for k in range (i+1):
    #insert line of code to say only two types of pairs can be included, if yes continue else skip
    #Step 4
    if j = "-" or k = "-":
    #Delete j/k and the corresponding element in the other pair
    #Step 5
    count_dict = {}
    square_dict = {}
    for base in list(i):
    if base in count_dict:
    count_dict[base] += 1
    else:
    count_dict[base] = 1
    for allele in count_dict:
    freq = (count_dict[allele] / len(i)) #frequencies of individual alleles
    #Calculate frequency of pairs
    #Step 6
    No code yet

    最佳答案

    我想这就是你想要的:

    from collections import Counter

    # Remove elements where all nucleobases are the same.
    for index in range(len(sample) - 1, -1, -1):
    if sample[index][:1] * len(sample[index]) == sample[index]:
    del sample[index]

    for indexA, setA in enumerate(sample):
    for indexB, setB in enumerate(sample):
    # Don't compare samples with themselves nor compare same pair twice.
    if indexA <= indexB:
    continue

    # Calculate number of unique pairs
    pair_count = Counter()
    for pair in zip(setA, setB):
    if '-' not in pair:
    pair_count[pair] += 1

    # Only analyse pairs of sets with 2 unique pairs.
    if len(pair_count) != 2:
    continue

    # Count individual bases.
    base_counter = Counter()
    for pair, count in pair_count.items():
    base_counter[pair[0]] += count
    base_counter[pair[1]] += count

    # Get the length of one of each item in the pair.
    sequence_length = sum(pair_count.values())

    # Convert counts to frequencies.
    base_freq = {}
    for base, count in base_counter.items():
    base_freq[base] = count / float(sequence_length)

    # Examine a pair from the two unique pairs to calculate float_a.
    pair = list(pair_count)[0]
    float_a = (pair_count[pair] / float(sequence_length)) - base_freq[pair[0]] * base_freq[pair[1]]

    # Step 7!
    float_b = float_a / float(base_freq.get('A', 0) * base_freq.get('T', 0) * base_freq.get('C', 0) * base_freq.get('G', 0))

    或者,更Pythonically(使用你不想要的列表/字典理解):

    from collections import Counter

    BASES = 'ATCG'

    # Remove elements where all nucleobases are the same.
    sample = [item for item in sample if item[:1] * len(item) != item]

    for indexA, setA in enumerate(sample):
    for indexB, setB in enumerate(sample):
    # Don't compare samples with themselves nor compare same pair twice.
    if indexA <= indexB:
    continue

    # Calculate number of unique pairs
    relevant_pairs = [(elA, elB) for (elA, elB) in zip(setA, setB) if elA != '-' and elB != '-']
    pair_count = Counter(relevant_pairs)

    # Only analyse pairs of sets with 2 unique pairs.
    if len(pair_count) != 2:
    continue

    # setA and setB as tuples with pairs involving '-' removed.
    setA, setB = zip(*relevant_pairs)

    # Get the total for each base.
    seq_length = len(setA)

    # Convert counts to frequencies.
    base_freq = {base : count / float(seq_length) for (base, count) in (Counter(setA) + Counter(setB)).items()}

    # Examine a pair from the two unique pairs to calculate float_a.
    pair = list(pair_count)[0]
    float_a = (pair_count[pair] / float(seq_length)) - base_freq[pair[0]] * base_freq[pair[1]]

    # Step 7!
    denominator = 1
    for base in BASES:
    denominator *= base_freq.get(base, 0)

    float_b = float_a / denominator

    关于python - 如何计算Python中列表成对比较的元素频率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40066439/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com