gpt4 book ai didi

c# - 您是否需要后台 worker 或多个线程来触发多个异步 HttpWebRequest?

转载 作者:太空宇宙 更新时间:2023-11-03 18:56:09 24 4
gpt4 key购买 nike

总体目标

我正在尝试调用 Google PageSpeed Insights API.txt 文件中读取多个输入 url,并将结果输出到 .csv

我尝试了什么

我编写了一个控制台应用程序来尝试触发这些请求,然后当它们返回时将它们添加到列表中,当它们全部完成后,将 list 写入 .csv 文件(当尝试立即将响应写入 .csv 时,async 变得有点疯狂)。

我的代码在下面,远未优化。我来自 JavaScript 背景,我通常不使用 web worker 或任何其他托管的新线程,所以我试图在 C# 中做同样的事情。

  1. 我可以运行这些多个 WebRequest 并将它们写入集合(或输出文件)而不使用多个线程并让它们全部并行运行,而不必等待每个请求的到来在处理下一个之前返回?
  2. 有没有更简洁的方法来使用回调来做到这一点?
  3. 如果需要线程或 BackgroundWorker,什么是 Clean Code这样做的方式?

初始示例代码

static void Main(string[] args)
{
Console.WriteLine("Begin Google PageSpeed Insights!");

appMode = ConfigurationManager.AppSettings["ApplicationMode"];
var inputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["InputFile"];
var outputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["OutputFile"];

var inputLines = File.ReadAllLines(inputFilePath).ToList();

if (File.Exists(outputFilePath))
{
File.Delete(outputFilePath);
}

List<string> outputCache = new List<string>();

foreach (var line in inputLines)
{
var requestDataFromPsi = CallPsiForPrimaryStats(line);
Console.WriteLine($"Got response of {requestDataFromPsi.Result}");

outputCache.Add(requestDataFromPsi.Result);
}

var writeTask = WriteCharacters(outputCache, outputFilePath);

writeTask.Wait();

Console.WriteLine("End Google PageSpeed Insights");
}

private static async Task<string> CallPsiForPrimaryStats(string url)
{
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create($"https://www.googleapis.com/pagespeedonline/v2/runPagespeed?url={url}&strategy=mobile&key={API_KEY}");
myReq.Method = WebRequestMethods.Http.Get;
myReq.Timeout = 60000;
myReq.Proxy = null;
myReq.ContentType = "application/json";

Task<WebResponse> task = Task.Factory.FromAsync(
myReq.BeginGetResponse,
asyncResult => myReq.EndGetResponse(asyncResult),
(object)null);

return await task.ContinueWith(t => ReadStreamFromResponse(t.Result));
}

private static string ReadStreamFromResponse(WebResponse response)
{
using (Stream responseStream = response.GetResponseStream())
using (StreamReader sr = new StreamReader(responseStream))
{
string jsonResponse = sr.ReadToEnd();
dynamic jsonObj = Newtonsoft.Json.JsonConvert.DeserializeObject(jsonResponse);

var psiResp = new PsiResponse()
{
Url = jsonObj.id,
SpeedScore = jsonObj.ruleGroups.SPEED.score,
UsabilityScore = jsonObj.ruleGroups.USABILITY.score,
NumberResources = jsonObj.pageStats.numberResources,
NumberHosts = jsonObj.pageStats.numberHosts,
TotalRequestBytes = jsonObj.pageStats.totalRequestBytes,
NumberStaticResources = jsonObj.pageStats.numberStaticResources,
HtmlResponseBytes = jsonObj.pageStats.htmlResponseBytes,
CssResponseBytes = jsonObj.pageStats.cssResponseBytes,
ImageResponseBytes = jsonObj.pageStats.imageResponseBytes,
JavascriptResponseBytes = jsonObj.pageStats.javascriptResponseBytes,
OtherResponseBytes = jsonObj.pageStats.otherResponseBytes,
NumberJsResources = jsonObj.pageStats.numberJsResources,
NumberCssResources = jsonObj.pageStats.numberCssResources,

};
return CreateOutputString(psiResp);
}
}

static async Task WriteCharacters(List<string> inputs, string outputFilePath)
{
using (StreamWriter fileWriter = new StreamWriter(outputFilePath))
{
await fileWriter.WriteLineAsync(TABLE_HEADER);

foreach (var input in inputs)
{
await fileWriter.WriteLineAsync(input);
}
}
}

private static string CreateOutputString(PsiResponse psiResponse)
{
var stringToWrite = "";

foreach (var prop in psiResponse.GetType().GetProperties())
{
stringToWrite += $"{prop.GetValue(psiResponse, null)},";
}
Console.WriteLine(stringToWrite);
return stringToWrite;
}

更新:来自 Stephen Cleary Tips 的重构之后

问题是这仍然运行缓慢。原来用了20分钟,重构后还是20分钟。它似乎在某个地方受到限制,可能是 Google 在 PageSpeed API 上。我测试了它,调用 https://www.google.com/ , https://www.yahoo.com/ , https://www.bing.com/和其他 18 个,它运行也很慢,瓶颈是一次只能处理大约 5 个请求。我尝试重构以运行 5 个测试 URL,然后写入文件并重复,但它只是略微加快了这个过程。

static void Main(string[] args) { MainAsync(args).Wait(); }
static async Task MainAsync(string[] args)
{
Console.WriteLine("Begin Google PageSpeed Insights!");

appMode = ConfigurationManager.AppSettings["ApplicationMode"];
var inputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["InputFile"];
var outputFilePath = READ_WRITE_PATH + ConfigurationManager.AppSettings["OutputFile"];

var inputLines = File.ReadAllLines(inputFilePath).ToList();

if (File.Exists(outputFilePath))
{
File.Delete(outputFilePath);
}

var tasks = inputLines.Select(line => CallPsiForPrimaryStats(line));
var outputCache = await Task.WhenAll(tasks);

await WriteCharacters(outputCache, outputFilePath);

Console.WriteLine("End Google PageSpeed Insights");
}

private static async Task<string> CallPsiForPrimaryStats(string url)
{
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create($"https://www.googleapis.com/pagespeedonline/v2/runPagespeed?url={url}&strategy=mobile&key={API_KEY}");
myReq.Method = WebRequestMethods.Http.Get;
myReq.Timeout = 60000;
myReq.Proxy = null;
myReq.ContentType = "application/json";
Console.WriteLine($"Start call: {url}");

// Try using `HttpClient()` later
//var myReq2 = new HttpClient();
//await myReq2.GetAsync($"https://www.googleapis.com/pagespeedonline/v2/runPagespeed?url={url}&strategy=mobile&key={API_KEY}");

Task<WebResponse> task = Task.Factory.FromAsync(
myReq.BeginGetResponse,
myReq.EndGetResponse,
(object)null);
var result = await task;
return ReadStreamFromResponse(result);
}

private static string ReadStreamFromResponse(WebResponse response)
{
using (Stream responseStream = response.GetResponseStream())
using (StreamReader sr = new StreamReader(responseStream))
{
string jsonResponse = sr.ReadToEnd();
dynamic jsonObj = Newtonsoft.Json.JsonConvert.DeserializeObject(jsonResponse);

var psiResp = new PsiResponse()
{
Url = jsonObj.id,
SpeedScore = jsonObj.ruleGroups.SPEED.score,
UsabilityScore = jsonObj.ruleGroups.USABILITY.score,
NumberResources = jsonObj.pageStats.numberResources,
NumberHosts = jsonObj.pageStats.numberHosts,
TotalRequestBytes = jsonObj.pageStats.totalRequestBytes,
NumberStaticResources = jsonObj.pageStats.numberStaticResources,
HtmlResponseBytes = jsonObj.pageStats.htmlResponseBytes,
CssResponseBytes = jsonObj.pageStats.cssResponseBytes,
ImageResponseBytes = jsonObj.pageStats.imageResponseBytes,
JavascriptResponseBytes = jsonObj.pageStats.javascriptResponseBytes,
OtherResponseBytes = jsonObj.pageStats.otherResponseBytes,
NumberJsResources = jsonObj.pageStats.numberJsResources,
NumberCssResources = jsonObj.pageStats.numberCssResources,

};
return CreateOutputString(psiResp);
}
}

static async Task WriteCharacters(IEnumerable<string> inputs, string outputFilePath)
{
using (StreamWriter fileWriter = new StreamWriter(outputFilePath))
{
await fileWriter.WriteLineAsync(TABLE_HEADER);

foreach (var input in inputs)
{
await fileWriter.WriteLineAsync(input);
}
}
}

private static string CreateOutputString(PsiResponse psiResponse)
{
var stringToWrite = "";
foreach (var prop in psiResponse.GetType().GetProperties())
{
stringToWrite += $"{prop.GetValue(psiResponse, null)},";
}
Console.WriteLine(stringToWrite);
return stringToWrite;
}

最佳答案

Can I run do these multiple WebRequests and write them to a collection (or output file) without using multiple threads and have them all run in parallel, not having to wait for each request to come back before handling the next one?

是的;您正在寻找的是异步并发,它使用 Task.WhenAll .

Is there a cleaner way to do this with callbacks?

async/await比回调更干净。 JavaScript 已经从回调转向 promise (类似于 C# 中的 Task<T>),再转向 async。/await (非常类似于 C# 中的 async/await)。两种语言中最干净的解决方案现在是 async/await .

不过,在 C# 中有一些陷阱,主要是由于向后兼容性。

1) 在异步控制台应用程序中,您确实需要阻止 Main方法。一般来说,这是唯一您应该阻塞异步代码的时间:

static void Main(string[] args) { MainAsync(args).Wait(); }
static async Task MainAsync(string[] args)
{

一旦你有了 async MainAsync方法,你可以使用Task.WhenAll对于异步并发:

  ...
var tasks = inputLines.Select(line => CallPsiForPrimaryStats(line));
var outputCache = await Task.WhenAll(tasks);
await WriteCharacters(outputCache, outputFilePath);
...

2) 你不应该使用 ContinueWith ;这是一个低级的、危险的 API。使用 await相反:

private static async Task<string> CallPsiForPrimaryStats(string url)
{
...
Task<WebResponse> task = Task.Factory.FromAsync(
myReq.BeginGetResponse,
myReq.EndGetResponse,
(object)null);
var result = await task;
return ReadStreamFromResponse(result);
}

3) 通常有更多的“异步友好”类型可用。在这种情况下,请考虑使用 HttpClient而不是 HttpWebRequest ;你会发现你的代码清理了很多。

关于c# - 您是否需要后台 worker 或多个线程来触发多个异步 HttpWebRequest?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45059008/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com