gpt4 book ai didi

go - 从 bufio 中读取文件,通过文件进行半复杂排序

转载 作者:IT王子 更新时间:2023-10-29 02:21:16 26 4
gpt4 key购买 nike

所以可能会有这样的问题,但谷歌并不是一件容易的事。基本上我有一个文件,它是一组编码和排序的 protobuf,因为它们通常来自 protobuf 规范。

所以想想字节值在整个文件中像这样被分 block :

[EncodeVarInt(protobuf 结构的大小)] [protobuf 结构字节]

因此您一次读取一个字节,用于在我们的 protof 结构上读取的大跳转。

我在文件上使用 os ReadAt 方法的实现目前看起来像这样。

// getting the next value in a file context feature 
func (geobuf *Geobuf_Reader) Next() bool {
if geobuf.EndPos <= geobuf.Pos {
return false
} else {
startpos := int64(geobuf.Pos)

for int(geobuf.Get_Byte(geobuf.Pos)) > 127 {
geobuf.Pos += 1
}
geobuf.Pos += 1

sizebytes := make([]byte,geobuf.Pos-int(startpos))

geobuf.File.ReadAt(sizebytes,startpos)

size,_ := DecodeVarint(sizebytes)

geobuf.Feat_Pos = [2]int{int(size),geobuf.Pos}
geobuf.Pos = geobuf.Pos+int(size)

return true
}
return false
}

// reads a geobuf feature as geojson
func (geobuf *Geobuf_Reader) Feature() *geojson.Feature {
// getting raw bytes
a := make([]byte,geobuf.Feat_Pos[0])
geobuf.File.ReadAt(a,int64(geobuf.Feat_Pos[1]))

return Read_Feature(a)
}

我怎样才能实现类似 bufio 或其他分 block 读取机制的东西来加速这么多文件的 ReadAt?我见过的大多数 bufio 实现都具有特定的定界符。提前致谢,希望这不是一个可怕的问题。

最佳答案

Package bufio

import "bufio" 

type SplitFunc

SplitFunc is the signature of the split function used to tokenize the input. The arguments are an initial substring of the remaining unprocessed data and a flag, atEOF, that reports whether the Reader has no more data to give. The return values are the number of bytes to advance the input and the next token to return to the user, plus an error, if any. If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.

If the returned error is non-nil, scanning stops and the error is returned to the client.

The function is never called with an empty data slice unless atEOF is true. If atEOF is true, however, data may be non-empty and, as always, holds unprocessed text.

type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)

使用bufio.Scanner并编写自定义 protobuf 结构 SplitFunc .

关于go - 从 bufio 中读取文件,通过文件进行半复杂排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48576299/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com