html - 为什么 YQL 会返回额外的标签？-6ren

html - 为什么 YQL 会返回额外的标签？

转载作者：行者123 更新时间：2023-11-28 03:19:10

24

4

我正在使用以下内容在 YQL 控制台中进行查询:

select * from html
where url='http://www.motorni-masla.net/index.php?main_page=product_oil_info&cPath=140&products_id=294&zenid=c8281021bbfed454176247900b3b9d4a'
and xpath='//*[@id="productPrices"]'

思路是找出id为“productPrices”的元素，并以JSON格式返回其内容。

但是当我这样做时 - 结果与原始代码不同 - 有额外的标签。

原创内容:

<strong>
<h2 id="productPrices" class="productGeneral">
<span class="normalprice">14.00лв. </span>&nbsp;<span class="productSpecialPrice">11.00лв.</span><span class="productPriceDiscount">
<br>Спести:&nbsp;21% отстъпка</span>
</h2>
</strong>

YQL 结果:

{
   "h2": {
    "class": "productGeneral",
    "id": "productPrices",
    "strong": {
     "span": [
      {
       "class": "normalprice",
       "content": "14.00лв."
      },
      {
       "class": "productSpecialPrice",
       "content": "11.00лв."
      },
      {
       "class": "productPriceDiscount",
       "br": null,
       "content": "\nСпести: 21% отстъпка"
      }
     ],
     "content": "  "
    }

基本上在原始内容中——顺序是: strong -> h2 -> span在 YQL 结果中它是: h2 -> strong -> span

这使得我的 XPATH 变得毫无用处，因为我不能在 YQL 语句中使用它——它不符合 YQL 得到的结果。在另一种情况下，不仅顺序不同，而且有一个 <p>。从 nower 添加的标签。

如果有人告诉我这里发生了什么，我将不胜感激。

最佳答案

页面的标记无效。显然，YQL 的解析器正在做一件事来修复它，而浏览器(或者至少，我的 Chrome 版本以及显然您使用的任何浏览器)正在做另一件事来修复它。

无效位是您不能将h2 放在strong 中。 strong的内容模型是措辞内容，但是h2不能进入短语内容，只能进入流内容。

关于html - 为什么 YQL 会返回额外的标签？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25487559/

24

4

0

文章推荐： c++ - 使用 std::unary_function 时的 Lint 警告

文章推荐： android - CSS 背景不会在普通的 Android 浏览器上重复？

文章推荐： HTML 页面更改未显示在用户的 Chrome 浏览器中

首页

博学

6Ren·AI

商城