.net - 我怎样才能让一个 IPropagatorBlock<TInput, TOutput> 自动停止？-6ren

.net - 我怎样才能让一个 IPropagatorBlock 自动停止？

转载作者：行者123 更新时间：2023-12-04 16:50:37

假设我从 TransformBlock<Uri, string> 开始(它本身是 IPropagatorBlock<Uri, string> 的实现)采用 Uri然后获取string中的内容(这是一个网络爬虫):

var downloader = new TransformBlock<Uri, string>(async uri => {
    // Download and return string asynchronously...
});

一旦我有了字符串中的内容，我就会解析它以获取链接。由于一个页面可以有多个链接，我使用 TransformManyBlock<string, Uri> 将单个结果(内容)映射到多个链接:

// The discovered item block.
var parser = new TransformManyBlock<string, Uri>(s => {
    // Parse the content here, return an IEnumerable<Uri>.
});

解析器的关键在于它可以传回一个空序列，表示没有更多的项目需要解析。

但是，这仅适用于树的一个分支(或网络的一部分)。

然后我将下载器链接到解析器，然后返回到下载器，如下所示:

downloader.LinkTo(parser);
parser.LinkTo(downloader);

现在，我知道我可以让一切都停止在 block 的外面(通过在其中一个上调用 Complete )但是我如何从内部发出它已完成的信号em> 积木？

还是我必须以某种方式自己管理这种状态？

现在，它只是挂起，因为在下载和解析所有内容后，下载程序 block 被饿死了。

这是一个完全包含的测试方法，它卡在对 Wait 的调用上:

[TestMethod]
public void TestSpider()
{
    // The list of numbers.
    var numbers = new[] { 1, 2 };

    // Transforms from an int to a string.
    var downloader = new TransformBlock<Tuple<int, string>, string>(
        t => t.Item2 + t.Item1.ToString(CultureInfo.InvariantCulture),

        // Let's assume four downloads to a domain at a time.
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 }
    );

    // Gets the next set of strings.
    var parser = new TransformManyBlock<string, Tuple<int, string>>(s => {
        // If the string length is greater than three, return an
        // empty sequence.
        // This is the signal for this branch to stop.
        if (s.Length > 3) return Enumerable.Empty<Tuple<int, string>>();

        // Branch out.
        return numbers.Select(n => new Tuple<int, string>(n, s));
    }, 
    // These are simple transformations/parsing, no need to not parallelize.
    // The dataflow blocks will handle the task allocation.
    new ExecutionDataflowBlockOptions {
        MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
    });

    // For broadcasting to an action.
    var parserBroadcaster = new BroadcastBlock<Tuple<int, string>>(
        // Clone.
        t => new Tuple<int, string>(t.Item1, t.Item2));

    // Indicate what was parsed.
    var parserConsumer = new ActionBlock<Tuple<int, string>>(
        t => Debug.WriteLine(
            string.Format(CultureInfo.InvariantCulture, 
                "Consumed - Item1: {0}, Item2: \"{1}\"",
            t.Item1, t.Item2)));

    // Link downloader to parser.
    downloader.LinkTo(parser);

    // Parser to broadcaster.
    parser.LinkTo(parserBroadcaster);

    // Broadcaster to consumer.
    parserBroadcaster.LinkTo(parserConsumer);

    // Broadcaster back to the downloader.
    parserBroadcaster.LinkTo(downloader);

    // Start the downloader.
    downloader.Post(new Tuple<int, string>(1, ""));

    // Wait on the consumer to finish.
    parserConsumer.Completion.Wait();
}

它的输出(正如预期的那样，在它挂起之前)是:

Consumed - Item1: 1, Item2: "1"
Consumed - Item1: 2, Item2: "1"
Consumed - Item1: 1, Item2: "11"
Consumed - Item1: 2, Item2: "11"
Consumed - Item1: 1, Item2: "12"
Consumed - Item1: 2, Item2: "12"
Consumed - Item1: 1, Item2: "111"
Consumed - Item1: 2, Item2: "111"
Consumed - Item1: 1, Item2: "112"
Consumed - Item1: 2, Item2: "112"
Consumed - Item1: 1, Item2: "121"
Consumed - Item1: 2, Item2: "121"
Consumed - Item1: 1, Item2: "122"
Consumed - Item1: 2, Item2: "122"

最佳答案

TPL 数据流框架没有任何可以开箱即用的功能。这更像是一个状态管理问题。

也就是说，关键在于跟踪已下载的 URL 以及仍需要下载的 URL。

处理这个问题的理想位置是解析器 block ；这是您拥有内容(将转换为更多下载链接)和下载内容的 URL 的地方。

处理上面的示例，需要引入一种捕获下载结果以及下载它的 URI 的方法(我会使用 Tuple ，但它会使事情变得太困惑):

public class DownloadResult
{
    public Tuple<int, string> Uri { get; set; }
    public string Content { get; set; }
}

从那里开始，下载 block 几乎相同，只是更新为输出上述结构:

[TestMethod]
public void TestSpider2()
{
    // The list of numbers.
    var numbers = new[] { 1, 2 };

    // Performs the downloading.
    var downloader = new TransformBlock<Tuple<int, string>, DownloadResult>(
        t => new DownloadResult { 
            Uri = t, 
            Content = t.Item2 + 
                t.Item1.ToString(CultureInfo.InvariantCulture) 
        },

        // Let's assume four downloads to a domain at a time.
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 }
    );

解析器的消费者不需要改变，但是确实需要更早声明，因为解析器必须向消费者发出信号它应该停止消费并且我们想要捕获它在传递给解析器的闭包中:

// Indicate what was parsed.
var parserConsumer = new ActionBlock<Tuple<int, string>>(
    t => Debug.WriteLine(
        string.Format(CultureInfo.InvariantCulture, 
            "Consumed - Item1: {0}, Item2: \"{1}\"",
            t.Item1, t.Item2)));

现在必须引入状态管理器:

// The dictionary indicating what needs to be processed.
var itemsToProcess = new HashSet<Tuple<int, string>>();

起初，我想只用一个 ConcurrentDictionary<TKey, TValue> ，但是由于原子操作必须围绕删除和多次添加执行，因此它没有提供所需的内容。一个简单的 lock statement是这里的最佳选择。

解析器是变化最大的。它会正常解析项目，但也会自动执行以下操作:

从状态机 (itemsToProcess) 中删除 URL
向状态机添加新的 URL。
如果在处理完上述内容后状态机中不存在任何项目，则通过调用 Complete method 向消费者 block 发出完成的信号。在 IDataflowBlock interface 上

看起来像这样:

// Changes content into items and new URLs to download.
var parser = new TransformManyBlock<DownloadResult, Tuple<int, string>>(
    r => {
        // The parsed items.
        IEnumerable<Tuple<int, string>> parsedItems;

        // If the string length is greater than three, return an
        // empty sequence.
        // This is the signal for this branch to stop.
        parsedItems = (r.Uri.Item2.Length > 3) ? 
            Enumerable.Empty<Tuple<int, string>>() :
            numbers.Select(n => new Tuple<int, string>(n, r.Content));

        // Materialize the list.
        IList<Tuple<int, string>> materializedParsedItems = 
            parsedItems.ToList();

        // Lock here, need to make sure the removal from
        // from the items to process dictionary and
        // the addition of the new items are atomic.
        lock (itemsToProcess)
        {
            // Remove the item.
            itemsToProcess.Remove(r.Uri);

            // If the materialized list has zero items, and the new
            // list has zero items, finish the action block.
            if (materializedParsedItems.Count == 0 && 
                itemsToProcess.Count == 0)
            {
                // Complete the consumer block.
                parserConsumer.Complete();
            }

            // Add the items.
            foreach (Tuple<int, string> newItem in materializedParsedItems) 
                itemsToProcess.Add(newItem);

                // Return the items.
                return materializedParsedItems;
            }
        }, 

        // These are simple transformations/parsing, no need to not 
        // parallelize.  The dataflow blocks will handle the task 
        // allocation.
        new ExecutionDataflowBlockOptions {
            MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
        });

广播公司和链接是一样的:

// For broadcasting to an action.
var parserBroadcaster = new BroadcastBlock<Tuple<int, string>>(
    // Clone.
    t => new Tuple<int, string>(t.Item1, t.Item2));

// Link downloader to parser.
downloader.LinkTo(parser);

// Parser to broadcaster.
parser.LinkTo(parserBroadcaster);

// Broadcaster to consumer.
parserBroadcaster.LinkTo(parserConsumer);

// Broadcaster back to the downloader.
parserBroadcaster.LinkTo(downloader);

启动 block 时，状态机必须在根传递给 Post method 之前使用要下载的 URL 进行准备。 :

// The initial post to download.
var root = new Tuple<int, string>(1, "");

// Add to the items to process.
itemsToProcess.Add(root);

// Post to the downloader.
downloader.Post(root);

然后调用 Wait method在 Task class 上相同，现在无需挂起即可完成:

    // Wait on the consumer to finish.
    parserConsumer.Completion.Wait();
}

关于.net - 我怎样才能让一个 IPropagatorBlock<TInput, TOutput> 自动停止？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/13202328/

文章推荐： schema - openam federation 的问题(使用 apaches 作为数据存储)

文章推荐： Python:导入 * 只从包中导入某些东西？

文章推荐： Excel 自动完成和自动填充键盘快捷键

jqueryscrollLeft 自动？
我想做的是，如果鼠标位于“下一个”按钮上，它会以慢速向右滚动，如果鼠标没有位于“下一个”按钮上，它会停止滚动？这是我的尝试http://jsfiddle.net/mdanz/nCCRy/14/ $(
.net - 自动 StyleCop
StyleCop 是一个很棒的视觉工作室小插件。但它不会向您显示实时提示或提供任何自动修复。随之而来的是 reSharper 和 StyleCop for reSharper，这是理想的解决方案，但
elasticsearch - 将MatchQuery的模糊性设置为“自动”
我为我的MatchQuery使用了模糊性选项，但是我想将模糊性值设置为auto。有什么办法吗？另外，对于完成建议程序，您可以将其设置为支持unicode，对于我的MatchQuery，有什么方法可以
java 将表行映射到对象(自动)
我想从表中获取一行[字符串名称，字符串密码，int 某些内容]并将其映射到一个 User 对象，该对象具有 3 个属性，如上面的 getter 和 setter有什么方法可以自动完成吗？我考虑过反射，
c# - 自动 .ToString()？
我有一个像这样的方法:void m1(string str) 并且有一个像这样的类: public class MyClass { public bool b1 { set; get; }
javascript - 自动$广播数据
我正在尝试使用 $rootScope 从一个 Controller 向另一个 Controller $broadcast 一些数据。如果我使用像 ng-click 这样的触发器来运行将广播的功能，它
mysql - 自动/手动缓存的优缺点
我考虑了很多关于是要使用完全自动化的缓存还是手动缓存。我们的自动方法是一种解决方案，它可以挖掘数据库、查询和格式化每个潜在和 future 的数据请求，并将其保存到适当的缓存存储(内存缓存或基于磁盘
css 悬停替代(自动)
我的 CSS 必须使用过渡来更改，直到现在我都使用 div:hover 来实现。当您单击另一个 div 时需要激活过渡，而不是当您将鼠标悬停在必须移动/更改的 div 上时。我该怎么做？谢谢永
快速动画持续时间，即使我不想(自动？)
在我的应用程序中，我需要一些动画，但如果它已经设置了动画，则不需要持续时间。但我的问题是它会自动添加持续时间。在这里你可以看到 2 个函数，第二个没有持续时间但它确实有持续时间(可能从 1 秒开始)
php - 自动 uploader
两年前，我需要制作一个工具，通过 POST 自动将 txt/csv 文件上传到我的 Web 服务器，然后使用 cronjob 通过 PHP 对其进行解析。这有两次在每天午夜自动发生。尽管这行得通，但
c - 函数参数中的数组名称的处理方式是否与本地声明的数组不同(自动)
请阅读下面程序中的评论: #include void test(char c[]) { c=c+2; //why does this work ? c--; printf("%
c++ - 自动/静态内存分配
也许是个幼稚的问题，但是...... 确认或拒绝: 自动和静态存储持续时间的对象/变量的内存的存在是在编译时确定的，程序运行时失败的可能性绝对为零，因为没有足够的内存用于自动对象。自然地，当自动对象
c# - 自动 INotifyPropertyChanged
有没有什么方法可以自动获得类中属性更改的通知，而不必在每个 setter 中都编写 OnPropertyChanged？ (我有数百个属性，我想知道它们是否已更改)。安东建议 dynamic pro
azure-pipelines - 基于源分支的不同合并策略(自动)
我们在使用 Azure DevOps 的项目中采用了 gitflow 流程。我有以下场景: 当功能分支合并到 Develop 时，我想在完成拉取请求的同时执行压缩合并策略当 Release 分支定期
AngularJS 自动 HTML 编码符号
我的网站上有一个评论部分，我将 html 编码的评论保存在我的数据库中。所以我添加了这条评论- "testing" `quotes` \and backslashes\ and html 并将其保存在
签到前 TFS 自动 checkout
是否存在“ checkin 前 TFS 自动 checkout ”这样的功能，以便在我说“ checkin ”之前我不会 checkout 任何文件，例如以防我只是临时更改文件 - 这一直发生。换句
apache - 自动 Apache 服务器维护页面
我有一个运行在 Linux/Apache/Tomcat 堆栈上的网站，它需要每隔几个月自动脱机以进行服务器维护，这将持续任意时间。有哪些选项可以让 Apache 建立和取消“服务器维护”页面？我需要
vba - 自动 Excel 首字母缩写词查找和定义添加
我经常在工作中创建文档，在公司内部，由于我们使用的首字母缩写词和缩写词的数量，我们几乎拥有自己的语言。因此，我厌倦了在发布文档之前手动创建首字母缩写词和缩写表，并且快速的谷歌搜索发现了一个可以有效地为
Excel:是否存在检测计算模式变化的事件(自动/手动)
我希望在用户或宏将计算模式从自动更改为手动或手动更改为自动时运行代码。是否有为此触发的事件？ (属性是 Application.Calculation 在 Excel 互操作中。) 使用 Excel
bash - 自动 Bash 脚本
这个问题在这里已经有了答案: Repeat command automatically in Linux (13 个回答) 6年前关闭。我想创建一个脚本来获取另一个文件夹中的所有文件夹名称。并为这些

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

.net - 我怎样才能让一个 IPropagatorBlock 自动停止？