gpt4 book ai didi

Java HTML 规范化器?

转载 作者:搜寻专家 更新时间:2023-11-01 03:13:08 25 4
gpt4 key购买 nike

是否有一个库可以将任何包含 JS、CSS 的给定 HTML 页面转换为简约的统一格式?

例如,如果我们呈现 stackoverflow 主页,我希望它以最小格式显示。我希望所有其他网站都呈现下来。

有点像 Lynx 网络浏览器,但图形最少。

最佳答案

我遇到的将 HTML 转换为 Lynx 样式文本的最佳工具是 Jericho's Renderer .

易于使用:

    Source source=new Source(new URL(sourceUrlString)); // or new Source("<html>pass in raw html string</html>");
String renderedText=source.getRenderer().toString();
System.out.println("\nSimple rendering of the HTML document:\n");
System.out.println(renderedText);

(来自 here)

并且可以很好地处理自然环境中的 HTML(格式错误)。

这是使用 Jericho 以这种方式格式化的页面的前几行:

Stack Exchange log in | careers | chat | meta | about | faq

Stack Overflow * Questions * Tags * Users * Badges * Unanswered * Ask Question

Java HTML normalizer?

**

IS there a library which can transform any given HTML page with JS, CSS all over it, into a minimalistic uniform format?

For instance, if we render stackoverflow homepage, I want it to be shown in a minimal format. I want all other sites to be rendered down.

Sort of like Lynx web browser but with minimal graphics.

java lynx link|edit|flag asked 2 days ago Kim Jong Woo 593112 89% accept rate Do you want to transform your HTML code to simpler HTML code, or do your want to show this "minimalistic uniform format" to your user? Or do you want to create a image? – Paŭlo Ebermann yesterday simpler html code without sacrificing the relative positioning of the elements. – Kim Jong Woo 16 hours ago

2 Answers

To answer your firtst question: No. I don'nt think there is a library for that purpose. (At least this is what my "googeling" resulted in).

And i think the reason for this is, that what you want is a very special need.

So as a solution for your problem you can parse the html and display it the way you want to in a JEditorpane or whatever you are using for display.

I can only suggest a way i would do it (this is because i am familiar with xml and everything around it).

* 

Use a library to ensure that your html conforms to xhtml:

http://htmlcleaner.sourceforge.net/release.php

* 

then either parse the xml with DOM or SAX parsers and display it the

way you want.

or

* use xslt to transform the document into some other html document

which results in a view that fits your needs.

or

* use one of the available html parser librarys. (The most of which i

found where kind of outdated (2006)) but they could be an option for you.

This is just one suggestion how you could do it. I'm sure there are thousands of other ways which will do the same thing.

关于Java HTML 规范化器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5139033/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com