gpt4 book ai didi

ruby-on-rails - Nokogiri:使用 XPath 搜索

转载 作者:数据小太阳 更新时间:2023-10-29 07:13:17 26 4
gpt4 key购买 nike

我使用 Nokogiri (Rubygem) css 搜索寻找某些<div>在我的 html 里面。看起来 Nokogiri 的 css 搜索不喜欢正则表达式。我想切换到 Nokogiri 的 xpath搜索,因为这似乎支持搜索字符串中的正则表达式。

如何在 xpath 搜索中实现下面提到的(伪)css 搜索?

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<p id='para-22'>B</p>
<h1>Bla</h1>
<p id='para-3'>C</p>
<p id='para-4'>D</p>
<div class="foo" id="eq-1_bl-1">
<p id='para-5'>3</p>
</div>
</body>
</html>"
HTML_END

# my_block is given
my_bl = "1"
# my_eq corresponds to this regex
my_eq = "\/[0-9]+\/"

# FIXME The following line should be changed to an xpath search.
if my_div = value.css("div#eq-#{my_eq}_bl-#{my_bl}.foo").first
# doing some stuff with the <p> inside the div
end

最佳答案

Mike Dalessio (一半的 Nokogiri 核心开发人员)在 #nokogiri (irc.freenode.net) 上给了我一个答案。看起来 Nokogiri CSS 和 XPath 搜索都不支持正则表达式匹配。这是他关于如何使用 Nokogiri 搜索正则表达式的解决方案:

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<p id='para-22'>B</p>
<h1>Bla</h1>
<p id='para-3'>C</p>
<p id='para-4'>D</p>
<div class="foo" id="eq-1_bl-1">
<p id='para-5'>3</p>
</div>
<div class="bar" id="eq-1_bl-1">
<p id='para-5'>3</p>
</div>
</body>
</html>"
HTML_END

# my_block is given
my_bl = "1"
# my_eq corresponds to this regex
my_eq = "[0-9]+"
# full regex to search for in node ids
full_regex = %r(eq-#{my_eq}_bl-#{my_bl})

filter_by_id = Class.new do
attr_accessor :matches

def initialize(regex)
@regex = regex
@matches = []
end

def filter(node_set)
@matches += node_set.find_all { |x| x['id'] =~ @regex }
end
end.new(full_regex)

value.css("div.foo:filter()", filter_by_id)
filter_by_id.matches.each do |node|
puts node
end

关于ruby-on-rails - Nokogiri:使用 XPath 搜索 <div>,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/649963/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com