gpt4 book ai didi

mongodb - Golang-gzip将mongodb查找查询的游标数据,写入文件并将其解压缩时出错

转载 作者:行者123 更新时间:2023-12-01 21:10:39 25 4
gpt4 key购买 nike

我正在迭代mongodb游标并gzip将数据压缩并发送到S3对象。尝试使用gzip -d解压缩上传的文件时,出现以下错误,

gzip: 9.log.gz: invalid compressed data--crc error
gzip: 9.log.gz: invalid compressed data--length error

下面给出了我用于迭代,压缩,上传的代码,
// CursorReader struct acts as reader wrapper on top of mongodb cursor
type CursorReader struct {
Csr *mongo.Cursor
}

// Read func reads the data from cursor and puts it into byte array
func (cr *CursorReader) Read(p []byte) (n int, err error) {
dataAvail := cr.Csr.Next(context.TODO())
if !dataAvail {
n = 0
err = io.EOF
if cr.Csr.Close(context.TODO()) != nil {
fmt.Fprintf(os.Stderr, "Error: MongoDB: getting logs: close cursor: %s", err)
}
return
}
var b bytes.Buffer
w := gzip.NewWriter(&b)
w.Write([]byte(cr.Csr.Current.String() + "\n"))
w.Close()
n = copy(p, []byte(b.String()))
err = nil
return
}
cursor, err := coll.Find(ctx, filter) // runs the find query and returns cursor
csrRdr := new(CursorReader) // creates a new cursorreader instance
csrRdr.Csr = cursor // assigning the find cursor to cursorreader instance
_, err = s3Uploader.Upload(&s3manager.UploadInput{ // Uploading the data to s3 in parts
Bucket: aws.String("bucket"),
Key: aws.String("key")),
Body: csrRdr,
})

如果数据不足,那我就没问题了。但是如果数据很大,那我就出错了。到目前为止,我已经调试过一些东西,试图压缩1500个文档,每个文档的大小为15MB,但出现错误。即使我尝试将gzip压缩后的字节直接写到本地文件中,但遇到了同样的错误。

最佳答案

问题似乎是在gzip.NewWriter()中重复调用func(*CursorReader) Read([]byte) (int, error)
您正在为每个对gzip.Writer的调用分配一个新的Readgzip压缩是有状态的,因此您必须仅对所有操作使用单个Writer实例。

解决方案1

对于您的问题,一个相当直接的解决方案是读取游标中的所有行,并将其通过gzip.Writer传递,并将压缩后的内容存储到内存缓冲区中。

var cursor, _ = collection.Find(context.TODO(), filter)
defer cursor.Close(context.TODO())

// prepare a buffer to hold gzipped data
var buffer bytes.Buffer
var gz = gzip.NewWriter(&buffer)
defer gz.Close()

for cursor.Next(context.TODO()) {
if _, err = io.WriteString(gz, cursor.Current.String()); err != nil {
// handle error somehow ¯\_(ツ)_/¯
}
}

// you can now use buffer as io.Reader
// and it'll contain gzipped data for your serialized rows
_, err = s3.Upload(&s3.UploadInput{
Bucket: aws.String("..."),
Key: aws.String("...")),
Body: &buffer,
})

解决方案#2

另一个解决方案是使用 io.Pipe() goroutines创建一个按需读取和压缩数据的流,而不是在内存缓冲区中。如果您正在读取的数据非常大并且无法将所有数据都保存在内存中,则此功能很有用。

var cursor, _ = collection.Find(context.TODO(), filter)
defer cursor.Close(context.TODO())

// create pipe endpoints
reader, writer := io.Pipe()

// note: io.Pipe() returns a synchronous in-memory pipe
// reads and writes block on one another
// make sure to go through docs once.

// now, since reads and writes on a pipe blocks
// we must move to a background goroutine else
// all our writes would block forever
go func() {
// order of defer here is important
// see: https://stackoverflow.com/a/24720120/6611700
// make sure gzip stream is closed before the pipe
// to ensure data is flushed properly
defer writer.Close()
var gz = gzip.NewWriter(writer)
defer gz.Close()

for cursor.Next(context.Background()) {
if _, err = io.WriteString(gz, cursor.Current.String()); err != nil {
// handle error somehow ¯\_(ツ)_/¯
}
}
}()

// you can now use reader as io.Reader
// and it'll contain gzipped data for your serialized rows
_, err = s3.Upload(&s3.UploadInput{
Bucket: aws.String("..."),
Key: aws.String("...")),
Body: reader,
})

关于mongodb - Golang-gzip将mongodb查找查询的游标数据,写入文件并将其解压缩时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61437984/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com