gpt4 book ai didi

python - 如何从多个 block 中一次读取两个连续 block 的数据直到文件末尾?

转载 作者:太空宇宙 更新时间:2023-11-03 14:17:41 30 4
gpt4 key购买 nike

如果你能想到一个好的标题,请更新标题!

我有以下结构的数据:

chr    pos    A_block    A_val
2 05 7 A,T,C
2 11 7 T,C,G
2 15 7 AT,C,G
2 21 7 C,A,GT
2 31 7 T,C,CA
2 42 9 T,C,G
2 55 9 C,G,GC
2 61 9 A,GC,T
2 05 12 AC,TG,G
2 11 12 A,TC,TG

预期输出:为了学习,我只想重写输出文件,与输入文件相同,但使用我下面建议的过程。

我想要: 步骤 01: 一次仅读取两个连续 block 的值(前 7 和 9)-> 步骤 02: 将该数据存储在字典中,并以 block 号作为主要唯一键 -> 步骤 03:将该字典返回给预定义函数进行解析。 -> 现在,读取 block (9 和 12) -> 重复相同的过程直到结束。

我在想这样的事情:

import req_packages
from collections import defaultdict

''' make a function that takes data from two blocks at a time '''
def parse_two_blocks(someData):
for key, vals in someData:
do ... something
write the obtained output
clear memory # to prevent memory buildup


''' Now, read the input file'''
with open('HaploBlock_toy.txt') as HaploBlocks:
header = HaploBlocks.readline()
# only reads the first line as header

''' create a empty dict or default dict. Which ever is better?'''
Hap_Dict = {}
Hap_Dict = defaultdict(list)


''' for rest of the lines '''
for lines in HaploBlocks:
values = lines.strip('\n').split('\t')

''' append the data to the dict for unique keys on the for loop, until the number of unique keys is 2 '''
Block = values[2]
Hap_Dict[Block].append(values[3])

do something to count the number of keys - how?
if keys_count > 2:
return parse_two_blocks(Hap_Dict)

elif keys_count < 2 or no new keys: # This one is odd and won't work I know.
end the program

因此,当代码执行时,它将从 block 7 和 9 中读取数据,直到字典被填满并返回到预定义函数。解析完成后,它现在可以只保留前一个解析的后一个 block 中的数据。这样它只需要读取剩余的 block 。

预期输出:现在对我来说主要问题是能够一次读取两个 block 。我不想添加如何解析“parse_two_blocks(someData)”中的信息的内在细节 - 这只是另一个问题。但是,让我们尝试重写与输入相同的输出。

最佳答案

将输入解析为 block 的动态列表(生成器)。迭代这些对。这一切都应该在您评估配对时完成。也就是说,这些行都不应该一次读取或存储整个 csv 文件。

#!/usr/bin/env python3

data = """chr pos A_block A_val
2 05 7 A,T,C
2 11 7 T,C,G
2 15 7 AT,C,G
2 21 7 C,A,GT
2 31 7 T,C,CA
2 42 9 T,C,G
2 55 9 C,G,GC
2 61 9 A,GC,T
2 05 12 AC,TG,G
2 11 12 A,TC,TG"""

import csv
import io
import itertools
import collections
import operator
from pprint import pprint

def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)

def one():
# read rows as tuples of values
c = csv.reader(io.StringIO(data), dialect=csv.excel_tab)
# read header row
keys = next(c)
block_index = keys.index('A_block')
# group rows by block numbers
blocks = itertools.groupby(c, key=operator.itemgetter(block_index))
# extract just the row values for each block
row_values = (tuple(v) for k, v in blocks)
# rearrange the values by column
unzipped_values = (zip(*v) for v in row_values)
# create a dictionary for each block
dict_blocks = (dict(zip(keys, v)) for v in unzipped_values)
yield from pairwise(dict_blocks)


def two():
c = csv.DictReader(io.StringIO(data), dialect=csv.excel_tab)
blocks = itertools.groupby(c, key=lambda x: x['A_block'])
yield from pairwise((k, list(v)) for k, v in blocks)


for a, b in one():
pprint(a)
pprint(b)
print()

输出(一个):

{'A_block': ('7', '7', '7', '7', '7'),
'A_val': ('A,T,C', 'T,C,G', 'AT,C,G', 'C,A,GT', 'T,C,CA'),
'chr': ('2', '2', '2', '2', '2'),
'pos': ('05', '11', '15', '21', '31')}
{'A_block': ('9', '9', '9'),
'A_val': ('T,C,G', 'C,G,GC', 'A,GC,T'),
'chr': ('2', '2', '2'),
'pos': ('42', '55', '61')}

{'A_block': ('9', '9', '9'),
'A_val': ('T,C,G', 'C,G,GC', 'A,GC,T'),
'chr': ('2', '2', '2'),
'pos': ('42', '55', '61')}
{'A_block': ('12', '12'),
'A_val': ('AC,TG,G', 'A,TC,TG'),
'chr': ('2', '2'),
'pos': ('05', '11')}

io.StringIO(string)

Take a string and return a file-like object that contains the contents of string.

csv.DictReader(file_object, dialect)来自csv module

Returns an ordered dict for each row where the field names taken from the very first row are used as dictionary keys for the field values.

groupby(iterable, key_function)

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element.

lambda x: x['A_block']

A temporary function that takes an input named x and returns the value for the key 'A_block'

(k, list(v)) 表示 block 中的 k、v

groupby() returns an iterator (that can only be used once) for the values. This converts that iterator to a list.

pairwise(iterable) recipe

"s -> (s0,s1), (s1,s2), (s2, s3), ..."

关于python - 如何从多个 block 中一次读取两个连续 block 的数据直到文件末尾?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48152273/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com