gpt4 book ai didi

go - 缓冲的 golang channel 丢失数据

转载 作者:IT王子 更新时间:2023-10-29 00:38:01 27 4
gpt4 key购买 nike

我正在尝试使用 goroutine 解析一个巨大的 Wiktionary 转储,并且遇到了一个奇怪的错误,即每次 channel 阻塞时 goroutine 正在读取的 channel 似乎都在丢失和损坏数据。

func main() {
inFile, err := os.Open(*srcFile)
if err != nil {
log.LogErrorf("Error opening dump: %v", err)
return
}
defer inFile.Close()

var wg sync.WaitGroup
input := make(chan []byte, 51)


go func() {
wg.Add(1)
for line := range input {
log.Printf("Bytes: %s", line)
// process the line
}
wg.Done()
}()

scanner := bufio.NewScanner(inFile)
count := 0
for scanner.Scan() {
count++
log.Printf("Scanned: %d", count)
if err := scanner.Err(); err != nil {
log.LogErrorf("Error scanning: %v", err)
}
newestBytes := scanner.Bytes()
log.Printf("Bytes: %s", newestBytes)
input <- newestBytes
}
close(input)
wg.Wait()
}

当我运行它时,我得到了正确的输出。请特别注意第 51 和 52 行。

2014/08/03 17:49:25 Scanned: 42
2014/08/03 17:49:25 Bytes: <namespace key="115" case="case-sensitive">Citations talk</namespace>
2014/08/03 17:49:25 Scanned: 43
2014/08/03 17:49:25 Bytes: <namespace key="116" case="case-sensitive">Sign gloss</namespace>
2014/08/03 17:49:25 Scanned: 44
2014/08/03 17:49:25 Bytes: <namespace key="117" case="case-sensitive">Sign gloss talk</namespace>
2014/08/03 17:49:25 Scanned: 45
2014/08/03 17:49:25 Bytes: <namespace key="828" case="case-sensitive">Module</namespace>
2014/08/03 17:49:25 Scanned: 46
2014/08/03 17:49:25 Bytes: <namespace key="829" case="case-sensitive">Module talk</namespace>
2014/08/03 17:49:25 Scanned: 47
2014/08/03 17:49:25 Bytes: </namespaces>
2014/08/03 17:49:25 Scanned: 48
2014/08/03 17:49:25 Bytes: </siteinfo>
2014/08/03 17:49:25 Scanned: 49
2014/08/03 17:49:25 Bytes: <page>
2014/08/03 17:49:25 Scanned: 50
2014/08/03 17:49:25 Bytes: <title>Wiktionary:Welcome, newcomers</title>
2014/08/03 17:49:25 Scanned: 51
2014/08/03 17:49:25 Bytes: <ns>4</ns>
2014/08/03 17:49:25 Scanned: 52
2014/08/03 17:49:25 Bytes: <id>6</id>
2014/08/03 17:49:25 Scanned: 53
2014/08/03 17:49:25 Bytes: <restrictions>edit=autoconfirmed:move=sysop</restrictions>
2014/08/03 17:49:25 Scanned: 54
2014/08/03 17:49:25 Bytes: <revision>
2014/08/03 17:49:25 Scanned: 55
2014/08/03 17:49:25 Bytes: <id>24557508</id>
2014/08/03 17:49:25 Scanned: 56
2014/08/03 17:49:25 Bytes: <parentid>19020708</parentid>
2014/08/03 17:49:25 Scanned: 57
2014/08/03 17:49:25 Bytes: <timestamp>2013-12-30T13:50:49Z</timestamp>
2014/08/03 17:49:25 Scanned: 58
2014/08/03 17:49:25 Bytes: <contributor>
2014/08/03 17:49:25 Scanned: 59

然而,当我改为打印行(goroutine 正在接收的内容)时,我得到了下面的输出。在第 51 行之后, channel 阻塞和主扫描并将另外 51 个值传递给 channel 。然而,goroutine 读取的下一行是不正确的,不仅如此,它显然是格式错误的。

Bytes:       <namespace key="828" case="case-sensitive">Module</namespace>
2014/08/03 17:40:52 Bytes: <namespace key="829" case="case-sensitive">Module talk</namespace>
2014/08/03 17:40:52 Bytes: </namespaces>
2014/08/03 17:40:52 Bytes: </siteinfo>
2014/08/03 17:40:52 Bytes: <page>
2014/08/03 17:40:52 Bytes: <title>Wiktionary:Welcome, newcomers</title>
2014/08/03 17:40:52 Scanned: 52
2014/08/03 17:40:52 Scanned: 53
2014/08/03 17:40:52 Scanned: 54
2014/08/03 17:40:52 Scanned: 55
2014/08/03 17:40:52 Scanned: 56
2014/08/03 17:40:52 Scanned: 57
2014/08/03 17:40:52 Scanned: 58
2014/08/03 17:40:52 Scanned: 59
2014/08/03 17:40:52 Scanned: 60
2014/08/03 17:40:52 Scanned: 61
2014/08/03 17:40:52 Scanned: 62
2014/08/03 17:40:52 Scanned: 63
2014/08/03 17:40:52 Scanned: 64
2014/08/03 17:40:52 Scanned: 65
2014/08/03 17:40:52 Scanned: 66
2014/08/03 17:40:52 Scanned: 67
2014/08/03 17:40:52 Scanned: 68
2014/08/03 17:40:52 Scanned: 69
2014/08/03 17:40:52 Scanned: 70
2014/08/03 17:40:52 Scanned: 71
2014/08/03 17:40:52 Scanned: 72
2014/08/03 17:40:52 Scanned: 73
2014/08/03 17:40:52 Scanned: 74
2014/08/03 17:40:52 Scanned: 75
2014/08/03 17:40:52 Scanned: 76
2014/08/03 17:40:52 Scanned: 77
2014/08/03 17:40:52 Scanned: 78
2014/08/03 17:40:52 Scanned: 79
2014/08/03 17:40:52 Scanned: 80
2014/08/03 17:40:52 Scanned: 81
2014/08/03 17:40:52 Scanned: 82
2014/08/03 17:40:52 Scanned: 83
2014/08/03 17:40:52 Scanned: 84
2014/08/03 17:40:52 Scanned: 85
2014/08/03 17:40:52 Scanned: 86
2014/08/03 17:40:52 Scanned: 87
2014/08/03 17:40:52 Scanned: 88
2014/08/03 17:40:52 Scanned: 89
2014/08/03 17:40:52 Scanned: 90
2014/08/03 17:40:52 Scanned: 91
2014/08/03 17:40:52 Scanned: 92
2014/08/03 17:40:52 Scanned: 93
2014/08/03 17:40:52 Scanned: 94
2014/08/03 17:40:52 Scanned: 95
2014/08/03 17:40:52 Scanned: 96
2014/08/03 17:40:52 Scanned: 97
2014/08/03 17:40:52 Scanned: 98
2014/08/03 17:40:52 Scanned: 99
2014/08/03 17:40:52 Scanned: 100
2014/08/03 17:40:52 Scanned: 101
2014/08/03 17:40:52 Scanned: 102
2014/08/03 17:40:52 Bytes: nd other refer
2014/08/03 17:40:52 Bytes: nce and instru
2014/08/03 17:40:52 Bytes: tional materials. It stipulates that any copy of the material,
2014/08/03 17:40:52 Bytes: even if modifi
2014/08/03 17:40:52 Bytes: d, carry the same licen
2014/08/03 17:40:52 Bytes: e. Those copies may be sold but, if
2014/08/03 17:40:52 Bytes: produced in quantity, have to be made available i
2014/08/03 17:40:52 Bytes: a format which fac
2014/08/03 17:40:52 Bytes: litates further editing.

我曾尝试在 Go playground 中重现这一点,但没有成功 - 这似乎与 slice 在 channel 中传递的方式有关。

最佳答案

函数Scanner.Bytes可能会返回扫描仪内部使用的相同 slice 。

func (s *Scanner) Bytes() []byte

Bytes returns the most recent token generated by a call to Scan. The underlying array may point to data that will be overwritten by a subsequent call to Scan. It does no allocation.

根据文档,此 slice 可能会被后续调用 Scanner.Scan 覆盖。由于您的代码无法确保在下一次调用 Scanner.Scan 之后不使用此 slice (实际上您的代码会生成行并异步使用它们),因此它可能在您正在尝试使用它。

显式复制 slice 以确保数据不会被后续调用 Scanner.Scan 覆盖。

input <- append(nil, newestBytes...)

关于go - 缓冲的 golang channel 丢失数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25107540/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com