gpt4 book ai didi

java - 用 Java 解析简单的 HTML

转载 作者:行者123 更新时间:2023-12-02 06:07:07 25 4
gpt4 key购买 nike

如何解析html的一部分?例如,我想显示“这里是 OL 列表项:”

示例“file.html”:

<h1>Heading 1</h1>
<h2>Heading 2</h2>
<p>This is some html. Look, here's an <u>underline</u>.</p>
<p>Look, this is <em>emphasized.</em> And here\\'s some <b>bold</b>.</p>
<p>Here are UL list items:
<ul>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ul>
<p>Here are OL list items:
<ol>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ol>

我尝试的是

webView.loadUrl("file:///android_asset/file.html");

但它显示了整个 html 代码

最佳答案

Learn to parse HTML Pages on Android with JSoup

When you make Android applications, you can have to parse HTML data or HTML pages got from the Web. One of the most known solution to make that in Java is to use JSoup Library. Like said on the official website of JSoup : “It is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.”

JSoup can be used in Android applications and we’re going to study how to parse an HTML Page on Android with JSoup. You can discover the tutorial in video on Youtube :

https://www.youtube.com/watch?v=BqMIcugsCFc

First, you need to add the JSoup dependency in your Gradle build file :

compile 'org.jsoup:jsoup:1.10.1'

For our example, we are going to download the content of the SSaurel’s Blog and display all the links of the main page. To download the content of a website, JSoup offers the connect method and then a get method. This last method works synchronously. So, we should call these methods in a separated Thread. Our application will have just a simple layout with a Button to launch the download of the website and a TextView to display the links.

It will have the following form :

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:id="@+id/activity_main"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:paddingBottom="@dimen/activity_vertical_margin"
android:paddingLeft="@dimen/activity_horizontal_margin"
android:paddingRight="@dimen/activity_horizontal_margin"
android:paddingTop="@dimen/activity_vertical_margin"
tools:context="com.ssaurel.jsouptut.MainActivity">

<Button
android:id="@+id/getBtn"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Get website"
android:layout_marginTop="50dp"
android:layout_centerHorizontal="true"/>

<TextView
android:id="@+id/result"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:text="Result ..."
android:layout_centerHorizontal="true"
android:layout_marginTop="30dp"
android:layout_below="@id/getBtn"
android:textSize="17sp"/>
</RelativeLayout>

In the main Activity of the application, we are going to get instances of the Button and the TextView from our layout. Then, we set a click listener on the Button to start the download of the website when the user will click it.

In the getWebsite() method, we create a new Thread to download the content of the website. We use the connect() method of the Jsoup object to connect the application to the website, then we call the get() method to download the content. These calls return a Document object instance. We have to call the select() method of this instance with the query to get all the links of the content. This query returns an Elements instance and finally, we have just to iterate on the elements contained in this object to display the content of each link to the screen.

At the end of our separated Thread, we refresh the UI with the links got from the website. This refresh is embedded inside a runOnUiThread call because it’s forbidden to refresh the UI elements inside a separated thread.

The code of the MainActivity has the following form :

package com.ssaurel.jsouptut;

import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.widget.Button;
import android.widget.TextView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;

public class MainActivity extends AppCompatActivity {

private Button getBtn;
private TextView result;

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
result = (TextView) findViewById(R.id.result);
getBtn = (Button) findViewById(R.id.getBtn);
getBtn.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View view) {
getWebsite();
}
});
}

private void getWebsite() {
new Thread(new Runnable() {
@Override
public void run() {
final StringBuilder builder = new StringBuilder();

try {
Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
String title = doc.title();
Elements links = doc.select("a[href]");

builder.append(title).append("\n");

for (Element link : links) {
builder.append("\n").append("Link : ").append(link.attr("href"))
.append("\n").append("Text : ").append(link.text());
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}

runOnUiThread(new Runnable() {
@Override
public void run() {
result.setText(builder.toString());
}
});
}
}).start();
}
}

Last step is to run the application and to enjoy the final result with all the links of the SSaurel’s blog displayed on the screen :

enter image description here

https://medium.com/@ssaurel/learn-to-parse-html-pages-on-android-with-jsoup-2a9b0da0096f

关于java - 用 Java 解析简单的 HTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55935418/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com