gpt4 book ai didi

html - 如何使 Nokogiri 透明地返回未触及/编码的 Html 实体?

转载 作者:太空狗 更新时间:2023-10-29 15:33:46 26 4
gpt4 key购买 nike

如何在不影响 html 实体(如德语变音符号)的情况下使用 Nokogiri?

即:

# this is fine
node = Nokogiri::HTML.fragment('<p>&ouml;</p>')
node.to_s # => '<p>&ouml;</p>'

# this is not
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>&ouml;</p>'

# this is what I need
node = Nokogiri::HTML.fragment('<p>ö</p>')
node.to_s # => '<p>ö</p>'

我试图同时处理 PARSE_OPTIONS 和 :save_with 选项,但无法想出一种方法让 Nokogiri 像上面那样透明地表现。

有什么建议吗?

最佳答案

好的,Aaron 通过 twitter 回答了我的问题/gist :

require 'rubygems'
require 'nokogiri'

doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'

# We added a contextual fragment method for the 1.4.2 release. This *might*
# work in 1.4.1. If you want to mess with 1.4.2, build from my github, or
# grab one of our nightly builds:
#
# $ sudo gem install nokogiri -s http://tenderlovemaking.com/
#
# Also, libxml2 had a bug with encoding when handling UTF-8 fragments, so I
# suggest you also upgrade to libxml2 2.7.7.
#
# Hope that helps!
puts doc.fragment('<p>ö</p>')

关于html - 如何使 Nokogiri 透明地返回未触及/编码的 Html 实体?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2567029/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com