gpt4 book ai didi

ruby - Nokogiri 用于在独特的标签集之间选择文本和 html

转载 作者:太空宇宙 更新时间:2023-11-03 18:25:06 24 4
gpt4 key购买 nike

我正在尝试使用 Nokogiri 来提取两组唯一标签之间的文本。

<h2 class="point">The problem</h2> 之间的 p 标签内获取文本的最佳方法是什么?和 <h2 class="point">The solution</h2> ,然后是 <h2 class="point">The solution</h2> 之间的所有 HTML和 <div class="frame box sketh">

完整 html 示例:

<h2 class="point">The problem</h2>
<p>TEXT I WANT </p>
<h2 class="point">The solution</h2>
HTML I WANT with it's own set of tags (but never an <h2> or <div>)
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

谢谢!

最佳答案

require 'nokogiri'

doc = Nokogiri.HTML(DATA)
doc.search('//h2/following-sibling::node()[name() != "h2" and name() != "div" and text() != "\n"]').each do |block|
p block.text
end

__END__
<h2 class="point">The problem</h2>
<p>TEXT I WANT</p>
<h2 class="point">The solution</h2>
<div>dont capture this</div>
<span>HTML I WANT with it's <p>own set <b>of</b> tags</p></span>
<div class="frame box sketh"><img src="URL for Image I want later" alt="" /></div>

输出:

"TEXT I WANT"
"HTML I WANT with it's own set of tags"

此 XPath 选择 h2 的所有后续兄弟节点,这些节点不是 h2div 或仅包含字符串 "\n".

关于ruby - Nokogiri 用于在独特的标签集之间选择文本和 html,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12478272/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com