gpt4 book ai didi

c# - 多次调用 GetStringAsync 的更有效方法?

转载 作者:行者123 更新时间:2023-11-30 12:22:03 25 4
gpt4 key购买 nike

我有(我的 url 列表大约有 1000 个 url),我想知道是否有更有效的调用来自同一站点的多个 url(已经更改 ServicePointManager.DefaultConnectionLimit)。

此外,重用相同的 HttpClient 还是在每次调用时创建一个新的更好,下面只使用一个而不是多个。

using (var client = new HttpClient { Timeout = new TimeSpan(0, 5, 0) })
{
var tasks = urls.Select(async url =>
{
await client.GetStringAsync(url).ContinueWith(response =>
{
var resultHtml = response.Result;
//process the html

});
}).ToList();

Task.WaitAll(tasks.ToArray());
}

正如@cory所建议的
这是使用 TPL 修改后的代码,但是我必须设置 MaxDegreeOfParallelism = 100 才能达到与基于任务的速度大致相同的速度,下面的代码可以改进吗?

var downloader = new ActionBlock<string>(async url =>
{
var client = new WebClient();
var resultHtml = await client.DownloadStringTaskAsync(new Uri(url));


}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 100 });

foreach(var url in urls)
{
downloader.Post(url);
}
downloader.Complete();
downloader.Completion.Wait();

最终版

public void DownloadUrlContents(List<string> urls)
{
var watch = Stopwatch.StartNew();

var httpClient = new HttpClient();
var downloader = new ActionBlock<string>(async url =>
{
var data = await httpClient.GetStringAsync(url);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 100 });

Parallel.ForEach(urls, (url) =>
{
downloader.SendAsync(url);
});
downloader.Complete();
downloader.Completion.Wait();

Console.WriteLine($"{MethodBase.GetCurrentMethod().Name} {watch.Elapsed}");
}

最佳答案

虽然您的代码可以工作,但通常的做法是为您的 ActionBlock 引入一个缓冲 block 。为什么要这样做?第一个原因是任务队列大小,您可以轻松平衡队列中的消息数。第二个原因是将消息添加到缓冲区几乎是即时的,之后 TPL Dataflow 负责处理您的所有项目:

// async method here
public async Task DownloadUrlContents(List<string> urls)
{
var watch = Stopwatch.StartNew();

var httpClient = new HttpClient();

// you may limit the buffer size here
var buffer = new BufferBlock<string>();
var downloader = new ActionBlock<string>(async url =>
{
var data = await httpClient.GetStringAsync(url);
// handle data here
},
// note processot count usage here
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount });
// notify TPL Dataflow to send messages from buffer to loader
buffer.LinkTo(downloader, new DataflowLinkOptions {PropagateCompletion = true});

foreach (var url in urls)
{
// do await here
await buffer.SendAsync(url);
}
// queue is done
buffer.Complete();

// now it's safe to wait for completion of the downloader
await downloader.Completion;

Console.WriteLine($"{MethodBase.GetCurrentMethod().Name} {watch.Elapsed}");
}

关于c# - 多次调用 GetStringAsync 的更有效方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42662040/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com