html - C++:如何递归/迭代搜索 HTML 文件(使用 Boost C++)？-6ren

html - C++:如何递归/迭代搜索 HTML 文件(使用 Boost C++)？

转载作者：行者123 更新时间：2023-11-28 06:41:48

25

4

我正在开发一个应用程序，我需要通过搜索字符串获取 HTML 文件(从 Web)并获取一条信息。

我认为将 HTML 文件视为 XML 文件并遍历 HTML 文件中的标签并将内容与字符串匹配会更有效、更容易。

这是我感兴趣的 HTML 表格:

<table width='100%' class='datatable' cellspacing='0' cellpadding='0'>
  <tr>
    <td>
    </td>
    <td width='30px'>
    </td>
    <td width='220px'>
    </td>
    <td width='50px'>
    </td>
  </tr>
  <tr>
    <td height='7' colspan='4'>
      <img src='/images/spacer.gif' width='1' height='7' border='0' alt=''>
    </td>
  </tr>
  <tr>
    <td width='170'>
      Aktiv tid: <!--This is a string I will search for.-->
    </td>
    <td colspan='3'>
      1 dag, 17:03:46 <!--This is a piece of information I need to obtain.-->
    </td>
  </tr>
  <tr>
    <td height='7' colspan='4'>
      <img src='/images/spacer.gif' width='1' height='7' border='0' alt=''>
    </td>
  </tr>
  <tr>
    <td width='170'>
      Bandbredd (upp/ned) [kbps/kbps]:
    </td>
    <td colspan='3'>
      1.058 / 21.373
    </td>
  </tr>
  <tr>
    <td height='7' colspan='4'>
      <img src='/images/spacer.gif' width='1' height='7' border='0' alt=''>
    </td>
  </tr>
  <tr>
    <td width='170'>
      Överförda data (skickade/mottagna) [GB/GB]: <!--This is another string I will search for.-->
    </td>
    <td colspan='3'>
      1,67 / 42,95 <!--This is another piece of information I need to obtain.-->
    </td>
  </tr>
</table>

所以我将搜索 <td>包含以下任一字符串的标签:

事件时间:
Överförda 数据(skickade/mottagna)[GB/GB]:

之后我需要选择下一个 <td>包含我想要的信息的标签(在同一个 <tr> 中。

我使用 cURL 成功获取了 HTML 文件，但在 XML 搜索算法方面需要一些帮助。

提前致谢!

(编辑:这是我想要的应用程序的伪代码(应该是不言自明的):

extern "C" {
    #include "url.h"
}

#include <string>
#include <iostream>

std::string xmlSearch(std::string fn, std::string str);

int main(void)
{
    /* download HTML file from URL to file */
    url("http://myurl.com/","page.html");

    /* search page.html for "Aktiv tid:" and return the content of the next <td> tag. */
    std::string data0 = xmlSearch("page.html","Aktiv tid:");

    /* search page.html for "Överförda data (skickade/mottagna) [GB/GB]:" and return the content of the next <td> tag. */
    std::string data1 = xmlSearch("page.html","Överförda data (skickade/mottagna) [GB/GB]:");

    /* process results */
}

std::string xmlSearch(std::string fn, std::string str){
    /* perform search algorithim */

    /* return content of the next <td> tag. */
}

)

最佳答案

我可以想象自己使用一个快速而肮脏的脚本来做这件事，而不是使用 C++，真的。

一行:

(tidy -asxml input.xml | xmllint --xpath 'descendant-or-self::*[starts-with(text(), "Aktiv tid:")]/following-sibling::*/text()' -) 2>/dev/null

这里

tidy 将古怪的 html 转换为 xml
xmllint 查询它:
- 来自*(任何元素)[starts-with(text(), "Aktiv tid:")]
- 从以下同级中选择text()
2>/dev/null 用于抑制来自 tidy 和 xmllint

Presto，它打印:

1 dag, 17:03:46

针对您的问题的精确输入。

关于html - C++:如何递归/迭代搜索 HTML 文件(使用 Boost C++)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25835729/

25

4

0

文章推荐： javascript - 当父级溢出隐藏时绝对定位 UL

文章推荐： html - 右对齐段落中的溢出文本

文章推荐： c++ - 一种获得元素休闲组合的好算法

boost - boost boost::spirit::qi以使用STL容器
我正在尝试使用boost.spirit的qi库解析某些内容，而我遇到了一个问题。根据spirit docs，a >> b应该产生类型为tuple的东西。但这是boost::tuple(又名 fusio
boost - 在 CMake 中轻松使用 Boost，无需安装 Boost(Boost CMake 模块化)
似乎有/正在努力做到这一点，但到目前为止我看到的大多数资源要么已经过时(带有死链接)，要么几乎没有信息来实际构建一个小的工作样本(例如，依赖于boost program_options 以构建可执行文
boost - boost.log 是 Boost 的正式一部分吗？
我对 Boost.Log 的状态有点困惑。这是 Boost 的官方部分，还是尚未被接受？当我用谷歌搜索时，我看到一些帖子谈论它在 2010 年是如何被接受的，等等，但是当我查看最后一个 Boost 库
boost - boost::string_ref 和 boost::string_view 的区别
Boost 提供了两种不同的实现 string_view ，这将成为 C++17 的一部分: boost::string_ref在 utility/string_ref.hpp boost::stri
boost - Boost.Geometry是否足够成熟？
最近，我被一家GIS公司雇用来重写他们的旧地理信息库。所以我目前正在寻找一个好的计算几何库。我看过CGAL，这真是了不起，但是我的老板想要免费的东西。所以我现在正在检查Boost.Geometry。
boost - 在图中添加和删除现有边(BOOST)？
假设我有一个无向图 G。假设我添加以下内容 add_edge(1,2,G); add_edge(1,3,G); add_edge(0,2,G); 现在我再说一遍: add_edge(0,2,G); 我
boost - CMake 找到 Boost，但导入的目标不适用于 Boost 版本
我使用 CMake 来查找 Boost。找到了 Boost，但 CMake 出错了 Imported targets not available for Boost version 请参阅下面的完整错
boost - boost::MPL 和 boost::fusion 之间的区别
我是 boost::fusion 和 boost::mpl 库的新手。谁能告诉我这两个库之间的主要区别？到目前为止，我只使用 fusion::vector 和其他一些简单的东西。现在我想使用 fus
boost - boost phoenix什么时候有用？
这个问题已经有答案了: 已关闭10 年前。 Possible Duplicate: What are the benefits of using Boost.Phoenix? 所以我开始阅读 boos
boost - 链接器错误 : Boost. Chrono 到 Boost.Timer
我正在尝试获得一个使用 Boost.Timer 的简单示例，用于一些秒表性能测量，但我不明白为什么我无法成功地将 Boost.Timer 链接到 Boost.Chrono。我使用以下简单脚本从源代码构
boost - C++ boost::shared_ptr & boost::weak_ptr & dynamic_cast
我有这样的东西: enum EFood{ eMeat, eFruit }; class Food{ }; class Meat: public Food{ void someM
boost - Boost::variant与无序映射
有人可以告诉我，我如何获得boost::Variant处理无序地图？ typedef boost::variant lut_value;unordered_map table; 我认为有一个用于boo
boost - boost 几何中的环和多边形有什么区别？
我对 Boost.Geometry 中的环和多边形感到困惑。在文档中，没有图形显示什么是环，什么是多边形。谁能画图解释两个概念的区别？最佳答案在 Boost.Geometry 中，多边形被定义
boost - boost::pool<>::malloc 和 boost::pool<>::ordered_malloc 有什么区别，什么时候应该使用 boost::pool<>::ordered_malloc？
我正在使用 boost.pool，但我不知道何时使用 boost::pool<>::malloc和 boost::pool<>::ordered_malloc ? 所以， boost::pool<>:
c++ - (Boost 库) - boost::container::flat_set with boost::fast_pool_allocator
我正在尝试通过 *boost::fast_pool_allocator* 使用 *boost::container::flat_set*。但是，我收到编译错误。非常感谢您的意见和建议。为了突出这个问题
c++ - boost::bind、boost::asio、boost::thread 和类
sau_timer::sau_timer(int secs, timerparam f) : strnd(io), t(io, boost::posix_time::seconds(secs)
boost - Boost.Graph 中的 boost::out_edges( v, g ) 有什么作用？
我无法理解此功能的文档，我已多次看到以下内容 tie (ei,ei_end) = out_edges(*(vi+a),g); **g**::out_edge_iterator ei, ei_end;
boost-propertytree - 我们如何在另一个 boost ptree 中插入一个 boost ptree 作为节点？
我想在 C++ 中序列化分层数据结构。我正在处理的项目使用 boost，所以我使用 boost::property_tree::ptree 作为我的数据节点结构。我们有像 Person 这样的高级结
c++ - boost::exception_detail::clone_impl>
我需要一些帮助来解决这个异常，我正在实现一个 NPAPI 插件，以便能够使用来自浏览器扩展的本地套接字，为此我正在使用 Firebreath 框架。对于套接字和连接，我使用带有异步调用的 Boost
c++ - boost::bind、boost::function 和 boost::factory 的问题
我尝试将 boost::bind 与 boost::factory 结合使用但没有成功我有这个类 Zambas 有 4 个参数(2 个字符串和 2 个整数)和 class Zambas { publ

首页

博学

6Ren·AI

商城

html - C++:如何递归/迭代搜索 HTML 文件(使用 Boost C++)？