gpt4 book ai didi

java.net.MalformedURLException : no protocol:/intl/en/policies/GET Request

转载 作者:可可西里 更新时间:2023-11-01 16:26:59 26 4
gpt4 key购买 nike

我一直在努力制作一个简单的程序,它运行一个页面中的所有链接,并访问它们,然后递归。但它似乎在运行时出现错误就停止了

java.net.MalformedURLException: no protocol: /intl/en/policies/
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at java.net.URL.<init>(Unknown Source)
at me.dylan.WebCrawler.WebC.sendGetRequest(WebC.java:67)
at me.dylan.WebCrawler.WebC.<init>(WebC.java:27)
at me.dylan.WebCrawler.WebC.main(WebC.java:36)

我的代码:

package me.dylan.WebCrawler;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;

import javax.swing.text.BadLocationException;
import javax.swing.text.EditorKit;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;

public class WebC {
// FileUtil f;
int linkamount=0;
ArrayList<URL> visited = new ArrayList<URL>();
ArrayList<String> urls = new ArrayList<String>();
public WebC() {

try {
// f= new FileUtil();
sendGetRequest("http://www.google.com");
} catch (IOException e) {
e.printStackTrace();
}
catch (BadLocationException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
new WebC();
}
public void sendGetRequest(String path) throws IOException, BadLocationException, MalformedURLException {

URL url = new URL(path);
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("GET");
con.setRequestProperty("Content-Language", "en-US");
BufferedReader rd = new BufferedReader(new InputStreamReader(con.getInputStream()));
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
kit.read(rd, doc, 0);

//Get all <a> tags (hyperlinks)
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
while (it.isValid())
{
MutableAttributeSet mas = (MutableAttributeSet)it.getAttributes();
//get the HREF attribute value in the <a> tag
String link = (String)mas.getAttribute(HTML.Attribute.HREF);
if(link!=null && link!="") {
urls.add(link);
}

it.next();
}
for(int i=urls.size()-1;i>=0;i--) {
if(urls.get(i)!=null) {
if(/*f.searchforString(urls.get(i)) ||*/ visited.contains(new URL(urls.get(i)))) {
urls.remove(i);
continue;
} else {
System.out.println(linkamount++);
System.out.println(path);
visited.add(new URL(path));
//f.write(urls.get(i));
sendGetRequest(urls.get(i));
}
try {
Thread.sleep(100);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
}

老实说,我不知道如何解决这个问题。显然 google 有一个 href 标签,它不是有效的 url,我该如何解决这个问题?

最佳答案

您必须在 URL 部分附加 baseURl。 URL 对象期望它的格式为 http://abc.com/int/etc/etc .

虽然表单将具有相对格式的格式。简单的方法是仅附加 http://www.google.com在调用获得的每个 HREF 之前。

关于java.net.MalformedURLException : no protocol:/intl/en/policies/GET Request,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15708165/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com