gpt4 book ai didi

go - 如何从URL池发出并发GET请求

转载 作者:行者123 更新时间:2023-12-03 10:08:37 26 4
gpt4 key购买 nike

我完成了建议的游览,在YouTube上观看了一些教程和地鼠 session 。就是这样。
我有一个项目,要求我发送获取请求并将结果存储在文件中。但是URL的数量约为8000万。
我仅使用1000个URL进行测试。
问题:尽管我遵循了一些准则,但我认为我无法使其并发。我不知道怎么了但是,也许我错了,而且它是并发的,但对我而言似乎并不快,这种速度感觉就像是顺序请求。
这是我编写的代码:

package main

import (
"bufio"
"io/ioutil"
"log"
"net/http"
"os"
"sync"
"time"
)

var wg sync.WaitGroup // synchronization to wait for all the goroutines

func crawler(urlChannel <-chan string) {
defer wg.Done()
client := &http.Client{Timeout: 10 * time.Second} // single client is sufficient for multiple requests

for urlItem := range urlChannel {
req1, _ := http.NewRequest("GET", "http://"+urlItem, nil) // generating the request
req1.Header.Add("User-agent", "Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/74.0") // changing user-agent
resp1, respErr1 := client.Do(req1) // sending the prepared request and getting the response
if respErr1 != nil {
continue
}

defer resp1.Body.Close()

if resp1.StatusCode/100 == 2 { // means server responded with 2xx code
text1, readErr1 := ioutil.ReadAll(resp1.Body) // try to read the sourcecode of the website
if readErr1 != nil {
log.Fatal(readErr1)
}

f1, fileErr1 := os.Create("200/" + urlItem + ".txt") // creating the relative file
if fileErr1 != nil {
log.Fatal(fileErr1)
}
defer f1.Close()

_, writeErr1 := f1.Write(text1) // writing the sourcecode into our file
if writeErr1 != nil {
log.Fatal(writeErr1)
}
}
}
}

func main() {
file, err := os.Open("urls.txt") // the file containing the url's
if err != nil {
log.Fatal(err)
}
defer file.Close() // don't forget to close the file

urlChannel := make(chan string, 1000) // create a channel to store all the url's

scanner := bufio.NewScanner(file) // each line has another url
for scanner.Scan() {
urlChannel <- scanner.Text()
}
close(urlChannel)

_ = os.Mkdir("200", 0755) // if it's there, it will create an error, and we will simply ignore it
for i := 0; i < 10; i++ {
wg.Add(1)
go crawler(urlChannel)
}
wg.Wait()
}
我的问题是:为什么此代码不能同时运行?我该如何解决上面提到的问题。发出并发GET请求时我做错了什么吗?

最佳答案

设置并发管道时,遵循的一个很好的指导原则是始终首先设置并实例化将同时执行的监听器(在您的情况下为爬网程序),然后开始通过管道(在您的情况下为urlChannel)向它们提供数据)。
在您的示例中,唯一防止死锁的事实是您实例化了一个缓冲 channel ,该 channel 具有与测试文件相同的行数(1000行)。该代码的作用是将URL放入urlChannel中。由于您的文件中有1000行,因此urlChannel可以不阻塞地接受所有行。如果您在文件中放入更多URL,则在填充urlChannel之后,执行将被阻止。
这是应该工作的代码版本:

package main

import (
"bufio"
"io/ioutil"
"log"
"net/http"
"os"
"sync"
"time"
)

func crawler(wg *sync.WaitGroup, urlChannel <-chan string) {
defer wg.Done()
client := &http.Client{Timeout: 10 * time.Second} // single client is sufficient for multiple requests

for urlItem := range urlChannel {
req1, _ := http.NewRequest("GET", "http://"+urlItem, nil) // generating the request
req1.Header.Add("User-agent", "Mozilla/5.0 (X11; Linux i586; rv:31.0) Gecko/20100101 Firefox/74.0") // changing user-agent
resp1, respErr1 := client.Do(req1) // sending the prepared request and getting the response
if respErr1 != nil {
continue
}

if resp1.StatusCode/100 == 2 { // means server responded with 2xx code
text1, readErr1 := ioutil.ReadAll(resp1.Body) // try to read the sourcecode of the website
if readErr1 != nil {
log.Fatal(readErr1)
}
resp1.Body.Close()

f1, fileErr1 := os.Create("200/" + urlItem + ".txt") // creating the relative file
if fileErr1 != nil {
log.Fatal(fileErr1)
}

_, writeErr1 := f1.Write(text1) // writing the sourcecode into our file
if writeErr1 != nil {
log.Fatal(writeErr1)
}
f1.Close()
}
}
}

func main() {
var wg sync.WaitGroup
file, err := os.Open("urls.txt") // the file containing the url's
if err != nil {
log.Fatal(err)
}
defer file.Close() // don't forget to close the file

urlChannel := make(chan string)

_ = os.Mkdir("200", 0755) // if it's there, it will create an error, and we will simply ignore it

// first, initialize crawlers
wg.Add(10)
for i := 0; i < 10; i++ {
go crawler(&wg, urlChannel)
}

//after crawlers are initialized, start feeding them data through the channel
scanner := bufio.NewScanner(file) // each line has another url
for scanner.Scan() {
urlChannel <- scanner.Text()
}
close(urlChannel)
wg.Wait()
}

关于go - 如何从URL池发出并发GET请求,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64632465/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com