gpt4 book ai didi

Java网站解析器

转载 作者:行者123 更新时间:2023-11-29 03:15:58 25 4
gpt4 key购买 nike

我正在尝试从站点解析以下行:

<div class="search-result__price">£2,995</div>

我只想要它的 2995 部分,但我很难做到。这是我的代码;它目前能够解析所有包含 £ 符号的行并在网站中显示所有货币。请帮忙!

public class parser {

private static String string1 = "&pound";
private String testURL = "http://www.autotrader.co.uk/search/used/cars/bmw/1_series/postcode/tn126bg/radius/1500/onesearchad/used%2Cnearlynew%2Cnew/quicksearch/true/page/2";
private ArrayList<String> list = new ArrayList<String>();
private ArrayList<Integer> prices = new ArrayList<Integer>();
private int averagePrice;
private int start;
private int finish;

public parser() throws IOException {

URL url = new URL(testURL);
Scanner scan = new Scanner(url.openStream());
boolean alreadyHit = false;

while (scan.hasNext()) {

String line = scan.nextLine();

if (line.contains(string1)) {

list.add(line);

start = line.indexOf("&pound;");
line = line.substring(start);
for (int i = 0; i < line.length(); i++) {

if (((line.charAt((i)) == ' ') || ((line.charAt((i)) == '<'))) && (alreadyHit == false)) {
finish = i;
alreadyHit = true;
}
}
alreadyHit = false;

line = line.substring(0, finish);
line = line.trim();
line = line.replace("&pound;", "");
line = line.replace(",", "");

try {

int price = Integer.parseInt(line);
prices.add(price);
} catch (Exception e) {

}
}
}
}

public static void main(String args[]) throws IOException {

parser p = new parser();

for (Integer x : p.prices) {

System.out.println(x);
}
}
}

最佳答案

与其逐行使用 Scanner 或使用正则表达式 (!) 获取清晰的 HTML 内容,不如使用 jsoup 之类的东西:

Document doc = Jsoup
.connect(testURL)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.timeout(60000).get();
Elements elems = doc.select("div .search-result__price");

关于Java网站解析器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26640671/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com