html - 如果在特定标签内，则删除特定标签-6ren

html - 如果在特定标签内，则删除特定标签

转载作者：数据小太阳更新时间：2023-10-29 08:38:47

25

4

我有问题，我必须找到快速的解决方案。

我想删除所有“表”内的 br 和 p 标记，但不删除外部。

例如

初始 html 文档:

...
<p>Hello</p>
<table>
  <tr>
    <td><p>Text example <br>continues...</p></td>
    <td><p>Text example <br>continues...</p></td>
    <td><p>Text example <br>continues...</p></td>
    <td><p>Text example <br>continues...</p></td>
  </tr>
</table>
<p>Bye<br></p>
<p>Bye<br></p>
...

我的目标:

...
<p>Hello</p>
<table>
  <tr>
    <td>Text example continues...</td>
    <td>Text example continues...</td>
    <td>Text example continues...</td>
    <td>Text example continues...</td>
  </tr>
</table>
<p>Bye<br></p>
<p>Bye<br></p>
...

现在，这就是我的清洁方法:

loop do
  if html.match(/<table>(.*?)(<\/?(p|br)*?>)(.*?)<\/table>/) != nil
    html = html.gsub(/<table>(.*?)(<\/?(p|br)*?>)(.*?)<\/table>/,'<table>\1 \4</table>')
  else
    break
  end
end

效果很好，但问题是，我有 1xxx 个文档，每个文档都有大约 1000 行……每个文档需要 1-3 个小时。 ((1-3 小时)*(千份文件)) = ¡ 痛苦!

我想用 Sanitize 或其他方法来做，但现在……我找不到方法。

谁能帮帮我？

提前致谢!马努

最佳答案

使用 Nokogiri :

require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-_HTML_
<p>Hello</p>
<table>
  <tr>
    <td><p>Text example <br>continues...</p></td>
    <td><p>Text example <br>continues...</p></td>
    <td><p>Text example <br>continues...</p></td>
    <td><p>Text example <br>continues...</p></td>
  </tr>
</table>
<p>Bye<br></p>
<p>Bye<br></p>
_HTML_

doc.xpath("//table/tr/td/p").each do |el|
  el.replace(el.text)
end

puts doc.to_html

输出:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>Hello</p>
<table><tr>
<td>Text example continues...</td>
    <td>Text example continues...</td>
    <td>Text example continues...</td>
    <td>Text example continues...</td>
  </tr></table>
<p>Bye<br></p>
<p>Bye<br></p>
</body>
</html>

关于html - 如果在特定标签内，则删除特定标签，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17952371/

25

4

0

文章推荐： ruby - windows7安装jekyll

文章推荐： javascript - Ruby on Rails 4 和 asset_path 的使用

文章推荐： ruby - 使用 dalli 运行 memcached 时出现连接错误

文章推荐： ruby - 如何在 Eclipse indigo 中创建一个新的 ruby 项目

首页

博学

6Ren·AI

商城

html - 如果在特定标签内，则删除特定标签

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内

首页

博学

6Ren·AI

商城

html - 如果在特定标签内，则删除特定标签

标签)？ 根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？ 是吗 stackoverflow 或 stackoverflow 谢谢 最佳答案 根据网络标准，您不能将 block 元素放入内

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内