gpt4 book ai didi

通过递归调用时 Go-routine 不运行

转载 作者:数据小太阳 更新时间:2023-10-29 03:42:43 26 4
gpt4 key购买 nike

我正在做 Web Crawler围棋之旅的问题。到目前为止,这是我的解决方案:

func GatherUrls(url string, fetcher Fetcher) []string {
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println("error:", err)
} else {
fmt.Printf("found: %s %q\n", url, body)
}
return urls
}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
// get all urls for depth
// check if url has been crawled
// Y: noop
// N: crawl url
// when depth is 0, stop
fmt.Printf("crawling %q...\n", url)
if depth <= 0 {
return
}
urls := GatherUrls(url, fetcher)
fmt.Println("urls:", urls)
for _, u := range urls {
fmt.Println("currentUrl:", u)
if _, exists := cache[u]; !exists {
fmt.Printf("about to crawl %q\n", u)
go Crawl(u, depth - 1, fetcher)
} else {
cache[u] = true
}
}
}

func main() {
cache = make(map[string]bool)
Crawl("https://golang.org/", 4, fetcher)
}

当我运行这段代码时,函数递归时永远不会调用 Crawl()(我知道这是因为 fmt.Printf("crawling %q...\n", url ) 只被调用一次)

这是日志:

crawling "https://golang.org/"...
found: https://golang.org/ "The Go Programming Language"
urls: [https://golang.org/pkg/ https://golang.org/cmd/]
currentUrl: https://golang.org/pkg/
about to crawl "https://golang.org/pkg/"
currentUrl: https://golang.org/cmd/
about to crawl "https://golang.org/cmd/"

我做错了什么?我怀疑产生一个线程来做递归是错误的方法吗?请指教。

请注意,我想用尽可能少的库来做​​到这一点。我在 WaitGroup 包中看到了一些答案。我不想用这个。

注意:包含类(class)样板的完整代码如下:包主

import (
"fmt"
)

var cache map[string]bool

type Fetcher interface {
// Fetch returns the body of URL and
// a slice of URLs found on that page.
Fetch(url string) (body string, urls []string, err error)
}

func GatherUrls(url string, fetcher Fetcher) []string {
body, urls, err := fetcher.Fetch(url)
if err != nil {
fmt.Println("error:", err)
} else {
fmt.Printf("found: %s %q\n", url, body)
}
return urls
}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
// get all urls for depth
// check if url has been crawled
// Y: noop
// N: crawl url
// when depth is 0, stop
fmt.Printf("crawling %q...\n", url)
if depth <= 0 {
return
}
urls := GatherUrls(url, fetcher)
fmt.Println("urls:", urls)
for _, u := range urls {
fmt.Println("currentUrl:", u)
if _, exists := cache[u]; !exists {
fmt.Printf("about to crawl %q\n", u)
go Crawl(u, depth - 1, fetcher)
} else {
cache[u] = true
}
}
}

func main() {
cache = make(map[string]bool)
Crawl("https://golang.org/", 4, fetcher)
}

// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
body string
urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
if res, ok := f[url]; ok {
return res.body, res.urls, nil
}
return "", nil, fmt.Errorf("not found: %s", url)
}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
"https://golang.org/": &fakeResult{
"The Go Programming Language",
[]string{
"https://golang.org/pkg/",
"https://golang.org/cmd/",
},
},
"https://golang.org/pkg/": &fakeResult{
"Packages",
[]string{
"https://golang.org/",
"https://golang.org/cmd/",
"https://golang.org/pkg/fmt/",
"https://golang.org/pkg/os/",
},
},
"https://golang.org/pkg/fmt/": &fakeResult{
"Package fmt",
[]string{
"https://golang.org/",
"https://golang.org/pkg/",
},
},
"https://golang.org/pkg/os/": &fakeResult{
"Package os",
[]string{
"https://golang.org/",
"https://golang.org/pkg/",
},
},
}

最佳答案

如您在此示例中所见:https://tour.golang.org/concurrency/10 ,我们应该做以下任务:

  • 并行获取 URL。
  • 不要两次获取相同的 URL。
  • 缓存已在 map 上获取的 URL,但单独使用 map 并不安全!

因此,我们可以通过以下步骤来解决上述任务:

创建结构来存储获取结果:

type Result struct {
body string
urls []string
err error
}

创建一个结构来存储已经在 map 上获取的URL,我们需要使用sync.Mutex,这在'A Tour of Go'中没有介绍:

type Cache struct {
store map[string]bool
mux sync.Mutex
}

并行获取 URL 和主体:在获取 URL 时将其添加到缓存中,但首先我们需要通过互斥锁并行锁定读/写。因此,我们可以像这样修改 Crawl 函数:

func Crawl(url string, depth int, fetcher Fetcher) {
if depth <= 0 {
return
}

ch := make(chan Result)

go func(url string, res chan Result) {
body, urls, err := fetcher.Fetch(url)

if err != nil {
ch <- Result{body, urls, err}
return
}

var furls []string
cache.mux.Lock()
for _, u := range urls {
if _, exists := cache.store[u]; !exists {
furls = append(furls, u)
}
cache.store[u] = true
}
cache.mux.Unlock()

ch <- Result{body: body, urls: furls, err: err}

}(url, ch)

res := <-ch

if res.err != nil {
fmt.Println(res.err)
return
}

fmt.Printf("found: %s %q\n", url, res.body)

for _, u := range res.urls {
Crawl(u, depth-1, fetcher)
}
}

您可以查看完整代码并在 playground 中运行:https://play.golang.org/p/iY9uBXchx3w

希望这对您有所帮助。

关于通过递归调用时 Go-routine 不运行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50790190/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com