gpt4 book ai didi

java - 下载字符串返回乱码

转载 作者:行者123 更新时间:2023-11-29 23:44:40 26 4
gpt4 key购买 nike

我正在尝试将网页源文本作为要解析的字符串提取出来。结果是一些格式模糊类似于网站的 html,但带有无意义的文本。我这样做是作为教程的一部分,教师给出的源代码给了我同样的问题。它也适用于我尝试的每个网站。会不会是我的电脑/互联网连接有问题?

记录结果:

07-26 17:29:49.143 10863-10863/org.andrewedgar.downloadwebcontent I/Result: !otp tl
<-[fl E7> hm ls=n-sl-e ti8l-e"ln=" !edf-><-[fI ] hm ls=n-sl-e ti8 ag"><[ni]-
!-i E8> <tlcas"oj ti9 ag"><[ni]-
!-i tI ]<-><tlcas"oj"ln=e" !-!edf-> <ed
mt hre=uf8> <eanm=vepr"cnet"it=eiewdh nta-cl="
mt ae"ecito"cnet"omi ooulaplnigpg hr nld wsm adn aedms"
mt ae"uhr otn=Wwhmz>
tteZpyoe/il>
ln e=sotu cn ye"mg/-cn rf"sai/m/aio.n"
<- otAeoeCS-> <ikrl"tlset rf"sai/s/otaeoemncs> <- hmf cn S -
ln e=syehe"he=/ttccsteiyioscs> <- lgn otIosCS-> <ikrl"tlset rf"sai/s/lgn-otioscs> <- lgn ieIosCS-> <ikrl"tlset rf"sai/s/lgn-ieioscs> <- otta S -
ln e=syehe"he=/ttccsbosrpmncs> <- lcnvCS-> <ikrl"tlset rf"sai/s/lcnvmncs> <- nmt S -
ln e=syehe"he=/ttccsaiaemncs> <- eoo S -
ln e=syehe"he=/ttccsvnbxvnbxcs> <- W-aoslCS-> <ikrl"tlset rf"sai/s/w.aoslcs> <- anCS-> <ikrl"tlset rf"sai/s/ancs> <- epnieCS-> <ikrl"tlset rf"sai/s/epniecs>
srp r=/ttcj/edrmdrir283rsod142mnj"<srp> <ha> <oydt-p=srl"dt-agt"nveu aaofe=7"
!-i tI ]
pcas"rweugae>o r sn n<togotae<srn>bosr lae< rf"tp/boshpycm"ugaeyu rwe<a oipoeyu xeine<p
!edf->
dvi=peodr
dvcas'odr
dvcas"atr"<dv
/i> <dv<- rlae -
<edri=hae"cas"edrscin> <i ls=cnanr> <a ls=nva"
ahe=# ls=nva-rn"<m d"rnLg"sc"sai/m/apCdLgWtTx.n"at"apcd"<a
dvcas"-lxmn-rp> dvi=nveu ls=mimn"
<lcas"a"
l < aasrl ls=nvln cie rf"hm"Hm sa ls=s-ny>cret<sa>/>/i
/l
<dv
dvcas"eubn> < rf"tp:/er.apcd.o"cas"utn1>er<a
/i> <dv
/a> <dv
/edr !-Hae -
<eto d"oe ls=hr_eto rdat1pdig> <i ls=dslytbe> <i ls=tbecl"
dvcas"otie"
dvcas"eocnet> <1Lancd h<rfnwy/1
pPormigdenthv ob oigtdosadfutan.b>oehv oefnadlanhwt oe<p
ahe=hts/lanzpyoecm ls=bto_"LanNw/> <dv
/i>
/i> <dv
/eto>!-Hr eto -
<- QeyLb-> <citsc"sai/svno/qey11..i.s>/cit
!-BosrpJ -
srp r=/ttcj/edrbosrpmnj"<srp> <- ehrJ -
srp r=/ttcj/edrtte.i.s>/cit
!-wyonsj -
srp r=/ttcj/edrjur.apit.203mnj"<srp> <- lcnvJ -
srp r=/ttcj/edrjur.lcnvmnj"<srp> <- W-aoslJ -
srp r=/ttcj/edrolcrue.i.s>/cit
!-CutrpJ -
srp r=/ttcj/edrjur.oneu.i.s>/cit
!-Sot colJ -
srp r=/ttcj/edrsot-colmnj"<srp> <- edrJ -
srp r=/ttcj/edrvnbxmnj"<srp> <- jxhm S-> <citsc"sai/svno/qeyaacipmnj"<srp> <- o S-> <citsc"sai/svno/o.i.s>/cit
!-Mi S-> <citsc"sai/smi.s>/cit
<bd><hm>

代码:

   public class DownloadTask extends AsyncTask<String, Void, String> {

@Override
protected String doInBackground(String... urls) {

String result = "";
URL url;
HttpURLConnection urlConnection = null;

try {
url = new URL(urls[0]);
urlConnection = (HttpURLConnection) url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();

while (data != -1) {
data = reader.read();
char current = (char) data;
result += current;
data = reader.read();
}
return result;



} catch (Exception e) {
e.printStackTrace();
return "Failed";
}

}
}

@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);

DownloadTask task = new DownloadTask();

String result = null;


try {
result = task.execute("http://www.zappycode.com").get();

} catch (Exception e) {

e.printStackTrace();
}

Log.i("Result", result);
}
}

最佳答案

您每次迭代从流中读取两次:

while (data != -1) {
data = reader.read(); // <<- here
char current = (char) data;
result += current;
data = reader.read(); // <<- and here
}

但仅附加到结果一次。所以,你最终只会得到奇怪的字符。这样的事情应该有效:

while((int data = reader.read) != -1) result += (char) data

但一般来说,从输入中读取原始字节并将其转换为字符并不是一个好主意。这样的东西会更健壮:

BufferedReader br = new BufferedReader(reader)
StringBuilder accumulator = new StringBuilder()
while((String line = br.readLine()) != null) accumulator
.append(line)
.append(System.lineSeparator)

关于java - 下载字符串返回乱码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51544639/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com