python - 如何从多个 block 中一次读取两个连续 block 的数据直到文件末尾？-6ren

python - 如何从多个 block 中一次读取两个连续 block 的数据直到文件末尾？

转载作者：太空宇宙更新时间：2023-11-03 14:17:41

30

4

如果你能想到一个好的标题，请更新标题!

我有以下结构的数据:

chr    pos    A_block    A_val
  2     05       7       A,T,C
  2     11       7       T,C,G
  2     15       7       AT,C,G
  2     21       7       C,A,GT
  2     31       7       T,C,CA
  2     42       9       T,C,G
  2     55       9       C,G,GC
  2     61       9       A,GC,T
  2     05       12       AC,TG,G
  2     11       12       A,TC,TG

预期输出:为了学习，我只想重写输出文件，与输入文件相同，但使用我下面建议的过程。

我想要: 步骤 01: 一次仅读取两个连续 block 的值(前 7 和 9)-> 步骤 02: 将该数据存储在字典中，并以 block 号作为主要唯一键 -> 步骤 03:将该字典返回给预定义函数进行解析。 -> 现在，读取 block (9 和 12) -> 重复相同的过程直到结束。

我在想这样的事情:

import req_packages
from collections import defaultdict

''' make a function that takes data from two blocks at a time '''
def parse_two_blocks(someData):
    for key, vals in someData:
        do ... something 
        write the obtained output
        clear memory  # to prevent memory buildup


''' Now, read the input file'''
with open('HaploBlock_toy.txt') as HaploBlocks:
    header = HaploBlocks.readline()  
    # only reads the first line as header

    ''' create a empty dict or default dict. Which ever is better?'''
    Hap_Dict = {}
    Hap_Dict = defaultdict(list)


    ''' for rest of the lines '''
    for lines in HaploBlocks:
        values = lines.strip('\n').split('\t')

        ''' append the data to the dict for unique keys on the for loop, until the number of unique keys is 2 '''
        Block = values[2]
        Hap_Dict[Block].append(values[3])

        do something to count the number of keys - how?
        if keys_count > 2:
           return parse_two_blocks(Hap_Dict)

        elif keys_count < 2 or no new keys: # This one is odd and won't work I know.
           end the program

因此，当代码执行时，它将从 block 7 和 9 中读取数据，直到字典被填满并返回到预定义函数。解析完成后，它现在可以只保留前一个解析的后一个 block 中的数据。这样它只需要读取剩余的 block 。

预期输出:现在对我来说主要问题是能够一次读取两个 block 。我不想添加如何解析“parse_two_blocks(someData)”中的信息的内在细节 - 这只是另一个问题。但是，让我们尝试重写与输入相同的输出。

最佳答案

将输入解析为 block 的动态列表(生成器)。迭代这些对。这一切都应该在您评估配对时完成。也就是说，这些行都不应该一次读取或存储整个 csv 文件。

#!/usr/bin/env python3

data = """chr   pos A_block A_val
2   05  7   A,T,C
2   11  7   T,C,G
2   15  7   AT,C,G
2   21  7   C,A,GT
2   31  7   T,C,CA
2   42  9   T,C,G
2   55  9   C,G,GC
2   61  9   A,GC,T
2   05  12  AC,TG,G
2   11  12  A,TC,TG"""

import csv
import io
import itertools
import collections
import operator
from pprint import pprint

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)

def one():
    # read rows as tuples of values
    c = csv.reader(io.StringIO(data), dialect=csv.excel_tab)
    # read header row
    keys = next(c)
    block_index = keys.index('A_block')
    # group rows by block numbers
    blocks = itertools.groupby(c, key=operator.itemgetter(block_index))
    # extract just the row values for each block
    row_values = (tuple(v) for k, v in blocks)
    # rearrange the values by column
    unzipped_values = (zip(*v) for v in row_values)
    # create a dictionary for each block
    dict_blocks = (dict(zip(keys, v)) for v in unzipped_values)
    yield from pairwise(dict_blocks)


def two():
    c = csv.DictReader(io.StringIO(data), dialect=csv.excel_tab)
    blocks = itertools.groupby(c, key=lambda x: x['A_block'])
    yield from pairwise((k, list(v)) for k, v in blocks)


for a, b in one():
        pprint(a)
        pprint(b)
        print()

输出(一个):

{'A_block': ('7', '7', '7', '7', '7'),
 'A_val': ('A,T,C', 'T,C,G', 'AT,C,G', 'C,A,GT', 'T,C,CA'),
 'chr': ('2', '2', '2', '2', '2'),
 'pos': ('05', '11', '15', '21', '31')}
{'A_block': ('9', '9', '9'),
 'A_val': ('T,C,G', 'C,G,GC', 'A,GC,T'),
 'chr': ('2', '2', '2'),
 'pos': ('42', '55', '61')}

{'A_block': ('9', '9', '9'),
 'A_val': ('T,C,G', 'C,G,GC', 'A,GC,T'),
 'chr': ('2', '2', '2'),
 'pos': ('42', '55', '61')}
{'A_block': ('12', '12'),
 'A_val': ('AC,TG,G', 'A,TC,TG'),
 'chr': ('2', '2'),
 'pos': ('05', '11')}

io.StringIO(string)

Take a string and return a file-like object that contains the contents of string.

csv.DictReader(file_object, dialect)来自csv module

Returns an ordered dict for each row where the field names taken from the very first row are used as dictionary keys for the field values.

groupby(iterable, key_function)

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element.

lambda x: x['A_block']

A temporary function that takes an input named x and returns the value for the key 'A_block'

(k, list(v)) 表示 block 中的 k、v

groupby() returns an iterator (that can only be used once) for the values. This converts that iterator to a list.

pairwise(iterable) recipe

"s -> (s0,s1), (s1,s2), (s2, s3), ..."

关于python - 如何从多个 block 中一次读取两个连续 block 的数据直到文件末尾？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48152273/

30

4

0

文章推荐： c# - WP7 上的 TweetSharp 授权 (oAuth)

文章推荐： python - 从 QWebEngineProfile 获取 cookie 作为字典

文章推荐： c# - 将图片显示为隐藏或可见

文章推荐： python - 更新记录时发生 Django IntegrityError

jquery - 将字符串添加到 URL 末尾
我正在更改链接网址以添加 www.site.com/index.html?s=234&dc=65828 我通过此代码得到的是:site.com/&dc=65828 var target="&dc=65
尽管有返回值，但控制仍到达非 void 末尾
我在编译过程中收到错误: src/smtp.c:208:1: warning: control reaches end of non-void function [-Wreturn-type] 这是相
javascript - 如何将输入字段扩展到 div 末尾
这是我的 bootstrap/html 代码: Put email 位置正确，但我希望输入字段的大小延伸到 div 末尾。谁能帮帮我？最佳答案只需按百分比指定宽度，如下所示
javascript - 如何将字符串化对象添加到 url 末尾
我正在尝试获取一个像这样的 json 对象: filters = {"filters": myArray}; 并将其附加到 URL 的末尾，使用: this.router.navigate([`/de
javascript - 哈希符号添加到 URL 末尾？
这个问题已经有答案了: Remove hash from url (5 个回答) 已关闭 10 年前。我有一个网站，stepaheadresidents.com ，并且井号 (#) 会自动添加到 u
javascript - 将链接移动到 div 末尾
我有这个代码 $('container a').appendTo('.container'); dzedzdqdqdqzdqdzqdzqdqzdqd Forgot password
python - 将字符串添加到 URL 末尾
为了练习更多 Python 知识，我尝试了 pythonchallenge.com 上的挑战简而言之，作为第一步，此挑战要求从末尾带有数字的 url 加载 html 页面。该页面包含一行文本，其中有
scala - FS2 流运行直到 InputStream 末尾
我对 FS2 很陌生，需要一些有关设计的帮助。我正在尝试设计一个流，它将从底层的 InputStream 中提取 block ，直到结束。这是我尝试过的: import java.io.{File,
scala - FS2 流运行直到 InputStream 末尾
我对 FS2 很陌生，需要一些有关设计的帮助。我正在尝试设计一个流，它将从底层的 InputStream 中提取 block ，直到结束。这是我尝试过的: import java.io.{File,
javascript - json_encode 将空数组放在 json 末尾
我正在编写一个 ajax 应用程序，并且在 php 脚本中有一个函数: public function expire_user() { $r=array("return"=>'OK');
c++ - 如何确定何时位于 QListView 的底部/末尾？
我正在使用一个QListView，它包装了一个非常简单的列表模型。我想尝试实现类似于某些网页中看到的“无限滚动”的东西。目前，模型通过最多添加 100 个项目的方法更新(它们取自外部 Web API
ruby - 到达文件的 cucumber 末尾(EOFError)
运行 cucumber 测试给我以下错误 end of file reached (EOFError) /usr/lib64/ruby/2.0.0/net/protocol.rb:153:in
rest - URL 末尾 ID 的优缺点
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
javascript - ?ver= include 末尾 - 是否有技术效果？
我想知道版本命名的具体作用是什么？喜欢 jquery.js?ver=1.4.4 我的意思是如果我使用像这样的 cdn jquery/1.4.4/jquery.min.js?ver=1.4.4但是另一
php - 在扩展前的 url 末尾 append 字符串
" data-fancybox-group="gallery" title="">" alt="" /> 在此代码中 echo $prod['item_image_url'];打印存储在我的表中的图像
wordpress - 将 .html 添加到 URL 末尾
我目前使用 Wordpress 作为博客平台，但我想更改为使用 Jekyll 来生成静态页面。在 WordPress 上，我的 URL 使用以下格式: /年/月/日/标题但我想将其重定向到 /年/月
elasticsearch - anchor token 替换模式到 token 末尾
根据docs这应该是不可能的 Regular expressions cannot be anchored to the beginning or end of a token 尽管如此，它似乎对我有
javascript - 将动态生成的 dijit 附加到 div 末尾
有没有办法创建 dijit 并将其附加到 div 的末尾？假设我有以下代码: Add Person 我在网上找到了以下代码，但这替换了我的“attendants”div: var personCo
php - 将 JSON 对象附加到 URL 末尾
我有这段代码: //execute post (the result will be something like {"result":1,"error":"","id":"4da775
c - 插入到简单链接列表(任何位置、前端、末尾、中间)
我需要一些函数方面的帮助。我想编写一个插入链表的函数。但不仅仅是中间，如果必须插入前端或末尾，它也必须起作用。结构: typedef struct ranklist { i

首页

博学

6Ren·AI

商城

python - 如何从多个 block 中一次读取两个连续 block 的数据直到文件末尾？