gpt4 book ai didi

java - 使用套接字获取网页

转载 作者:搜寻专家 更新时间:2023-11-01 02:17:20 25 4
gpt4 key购买 nike

我目前正在学习套接字编程,遇到一个问题需要帮助。我试图做的是编写一个小的 Java 类,它将连接到 Web 主机,下载默认页面,然后断开与主机的连接。我知道使用 URLConnection 来执行此操作更简单,但我正在尝试学习 Sockets 类。我已成功连接到 Web 服务器,但无法拉入页面。到目前为止,这是我正在工作(和未工作)的内容:

import java.io.*;
import java.net.*;
import java.lang.IllegalArgumentException;
public class SocketsFun{
public static void main(String[] myArgs){
// Set some variables
String theServer = null;
String theLine = null;
int thePort = 0;
Socket theSocket = null;
boolean exit = false;
boolean socketCheck = false;
BufferedReader theInput = null;

// Grab the server and port number
try{
theServer = myArgs[0];
thePort = Integer.parseInt(myArgs[1]);
System.out.println("Opening a connection to " + theServer + " on port " + thePort);
} catch(ArrayIndexOutOfBoundsException aioobe){
System.out.println("usage: SocketsFun host port");
exit = true;
} catch(NumberFormatException nfe) {
System.out.println("usage: SocketsFun host port");
exit = true;
}

if(!exit){
// Open the socket
try{
theSocket = new Socket(theServer, thePort);
} catch(UnknownHostException uhe){
System.out.println("* " + theServer + " does not exist");
} catch(IOException ioe){
System.out.println("* " + "Connection Refused");
} catch(IllegalArgumentException iae){
System.out.println("* " + thePort + " Not A Valid TCP/UDP Port.");
}

// Print out some stuff
try{
System.out.println("Connected Socket: " + theSocket.toString());
} catch(Exception e){
System.out.println("* " + "No Open Socket");
}

try{
theInput = new BufferedReader(new InputStreamReader(theSocket.getInputStream()));
while ((theLine = theInput.readLine()) != null){
System.out.println(theLine);
}
theInput.close();
} catch(IOException ioe){
System.out.println("* " + "No Data To Read");
} catch(NullPointerException npe){
System.out.println("* " + "No Data To Read");
}

// Close the socket
try{
socketCheck = theSocket.isConnected();
} catch(NullPointerException npe){
System.out.println("* " + "No Socket To Close");
}
}
}
}

我想要的只是让这个类吐出“curl”、“lynx -dump”或“wget”等可能输出的内容。我们将不胜感激任何和所有帮助。

最佳答案

您的想法是对的,但您没有提交 HTTP 请求。发送:

GET / HTTP/1.1\r\nHost: <hostname\r\n\r\n

这遵循格式

[METHOD] [PATH] HTTP/1.1 [CRLF]Host: [HOSTNAME] [CRLF]OTHER: HEADERS [CRLF][CRLF]

您应该得到遵循类似格式的响应 - 标题、空行和数据。阅读有关 HTTP 协议(protocol)的更多信息。

编辑 开始时,可​​能有助于了解 HTTP 请求语法。这很简单,一般来说是一件好事。打开终端并使用 netcat (最好)或 telnet . netcat google.com 80telnet google.com 80 .类型:

GET / HTTP/1.1[ENTER]Host: google.com[ENTER][ENTER]

我得到响应(在第二次返回之后):

HTTP/1.1 301 Moved PermanentlyLocation: http://www.google.com/Content-Type: text/html; charset=UTF-8Date: Thu, 09 Dec 2010 00:03:39 GMTExpires: Sat, 08 Jan 2011 00:03:39 GMTCache-Control: public, max-age=2592000Server: gwsContent-Length: 219X-XSS-Protection: 1; mode=block<HTML&<HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"><TITLE>301 Moved</TITLE></HEAD><BODY><H1>301 Moved</H1>The document has moved<A HREF="http://www.google.com/">here</A>.</BODY></HTML>

一旦您了解了请求语法,只需将其写入套接字,然后读取这些行直到服务器关闭,就像您正在做的那样。

关于java - 使用套接字获取网页,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4393276/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com