gpt4 book ai didi

java - URLConnection 无法通过代理正确处理内容长度

转载 作者:搜寻专家 更新时间:2023-10-31 20:22:36 24 4
gpt4 key购买 nike

我遇到了以下问题:当通过代理使用 URLConnection 时,内容长度始终设置为 -1

首先,我检查了代理是否真的返回了Content-Length(lynxwget 也通过代理工作;没有其他方法从本地网络访问互联网):

$ lynx -source -head ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
HTTP/1.1 200 OK
Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
Content-Type: application/x-zip-compressed
Content-Length: 30745
Connection: close
Date: Thu, 02 Feb 2012 17:18:52 GMT

$ wget -S -X HEAD ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
--2012-04-03 19:36:54-- ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip
Resolving proxy... 10.10.0.12
Connecting to proxy|10.10.0.12|:8080... connected.
Proxy request sent, awaiting response...
HTTP/1.1 200 OK
Last-Modified: Mon, 09 Jul 2007 17:02:37 GMT
Content-Type: application/x-zip-compressed
Content-Length: 30745
Connection: close
Age: 0
Date: Tue, 03 Apr 2012 17:36:54 GMT
Length: 30745 (30K) [application/x-zip-compressed]
Saving to: `WO2003-104476-001.zip'

我在 Java 中写道:

URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
int length = url.openConnection().getContentLength();
logger.debug("Got length: " + length);

我得到 -1。我开始调试 FtpURLConnection,结果发现必要的信息在底层 HttpURLConnection.responses 字段中,但是它从未从那里正确填充:

enter image description here( header 中有 Content-Length: 30745)。当您开始读取流时或什至在读取流后,内容长度不会更新。代码:

URL url = new URL("ftp://ftp.wipo.int/pub/published_pct_sequences/publication/2003/1218/WO03_104476/WO2003-104476-001.zip");
URLConnection connection = url.openConnection();

logger.debug("Got length (1): " + connection.getContentLength());

InputStream input = connection.getInputStream();

byte[] buffer = new byte[4096];
int count = 0, len;
while ((len = input.read(buffer)) > 0) {
count += len;
}

logger.debug("Got length (2): " + connection.getContentLength() + " but wanted " + count);

输出:

Got length (1): -1
Got length (2): -1 but wanted 30745

好像是JDK6的bug,所以新开bug#7168608 .

  • 如果有人可以帮助我编写代码,应该为直接 FTP 连接、通过代理的 FTP 连接和本地 file:/ URL 返回正确的内容长度,我将不胜感激。
  • 如果 JDK6 无法解决给定的问题,请推荐任何其他绝对适用于我提到的所有情况的库(Apache Http Client?)。

最佳答案

请记住,代理通常会更改底层实体的表示。在你的情况下,我怀疑代理可能正在改变传输编码。反过来,即使提供了 Content-Length 也毫无意义。

您违反了 HTTP 1.1 规范的以下两个部分:

4.4 Message Length

  1. ...
  2. ...
  3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored.

14.41 Transfer-Encoding

The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity.

Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding

Transfer-codings are defined in section 3.6. An example is:

Transfer-Encoding: chunked

If multiple encodings have been applied to an entity, the transfer- codings MUST be listed in the order in which they were applied. Additional information about the encoding parameters MAY be provided by other entity-header fields not defined by this specification.

Many older HTTP/1.0 applications do not understand the Transfer- Encoding header.

因此,根据规范,URLConnection 将忽略 Content-Length header ,因为它在存在分块 传输时毫无意义

在您的调试器屏幕截图中,不清楚 Transfer-Encoding header 是否存在。请让我们知道...

进一步调查 - 似乎当您发出 lynx -head 时,lynx 并未显示返回的所有 header 。它没有显示对本次讨论至关重要的 Transfer-Encoding header 。

这是与公开可见的网站存在差异的证据

Ξ▶ lynx -useragent='dummy' -source -head http://www.bbc.co.uk                                                                                                                  
HTTP/1.1 302 Found
Server: Apache
X-Cache-Action: PASS (non-cacheable)
X-Cache-Age: 0
Content-Type: text/html; charset=iso-8859-1
Date: Tue, 03 Apr 2012 13:33:06 GMT
Location: http://www.bbc.co.uk/mobile/
Connection: close

Ξ▶ wget -useragent='dummy' -S -X HEAD http://www.bbc.co.uk
--2012-04-03 14:33:22-- http://www.bbc.co.uk/
Resolving www.bbc.co.uk... 212.58.244.70
Connecting to www.bbc.co.uk|212.58.244.70|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Server: Apache
Cache-Control: private, max-age=15
Etag: "7e0f292b2e5e4c33cac1bc033779813b"
Content-Type: text/html
Transfer-Encoding: chunked
Date: Tue, 03 Apr 2012 13:33:22 GMT
Connection: keep-alive
X-Cache-Action: MISS
X-Cache-Age: 0
X-LB-NoCache: true
Vary: Cookie

由于我显然不在您的网络中,所以我无法复制您的确切情况,但请验证您真的在通过代理时没有获得 Transfer-Encoding header 。

关于java - URLConnection 无法通过代理正确处理内容长度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9607290/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com