gpt4 book ai didi

java - 使用 HtmlUnit 下载文件。下载按钮没有(看似)可访问的链接

转载 作者:太空宇宙 更新时间:2023-11-04 10:27:47 24 4
gpt4 key购买 nike

我是编程新手,无法找到适合我的问题的答案,并且不确定还可以转向哪里。如标题所述,我希望使用 Java 中的 HtmlUnit 下载文件,但页面上的下载按钮没有我可以访问的 href 或 onclick。按钮如下:

<button class="btn btn-download btn-primary pull-right" id="eta_download" style="display: block;">
<span class="glyphicon glyphicon-download-alt"></span>
</button>

单击此按钮会使普通浏览器在短时间内进行一些处理和加载,然后打开一个选项卡,触发包含 tiff 卫星图像的 gzip 文件的下载。我正在 Swing 应用程序中执行此操作。

The site I need to download gzipped tiff from

谁能帮我让它发挥作用吗?

我的代码如下:

// Call from whithin new Thread. Get the download 
private void getDownload(String latitude, String longitude, String start, String end) throws Exception
{
// Create the browser
final WebClient webClient = new WebClient(BrowserVersion.CHROME);

// Report to user. Loading page...
SwingUtilities.invokeLater(new Runnable()
{
public void run()
{
reportLabel.setText("Loading EEFLUX...");
}
});

// Load page
HtmlPage page = webClient.getPage("https://eeflux-level1.appspot.com/");

// Report to user change in state
SwingUtilities.invokeLater(new Runnable()
{
public void run()
{
reportLabel.setText("Filling in values");
}
});

// Get Latitude, Lomgitude and Date Fields
HtmlInput latitudeField = (HtmlInput) page.getElementById("latitude");
HtmlInput longitudeField = (HtmlInput) page.getElementById("longitude");
HtmlInput date_start_Field = (HtmlInput) page.getElementById("date_start");
HtmlInput date_end_Field = (HtmlInput) page.getElementById("date_end");

// Set the values of fields to that passed into method
latitudeField.setAttribute("value", latitude);
longitudeField.setAttribute("value", longitude);
date_start_Field.setAttribute("value", start);
date_end_Field.setAttribute("value", end);

// Get the Search "Button" then click
HtmlAnchor search = (HtmlAnchor) page.getHtmlElementById("searchForImages");
page = search.click();

// wait for Javascripts jobs to finish
JavaScriptJobManager manager = page.getEnclosingWindow().getJobManager();
for (int i = 0; manager.getJobCount() > 7; i++)
{
final int j = i;
// Report to user change in state
SwingUtilities.invokeLater(new Runnable()
{
public void run()
{
reportLabel.setText("Loading after Search: " + j);
}
});

Thread.sleep(1000);
}

// Get the list of regions Satellites captured and click to open dropdown
HtmlDivision image_dropdown = (HtmlDivision) page.getElementById("image_dropdown");
image_dropdown.click();

// Get the list of regions
HtmlUnorderedList region_list = (HtmlUnorderedList) image_dropdown.getLastElementChild();

// get iterator for list
Iterator<DomElement> web_list = region_list.getChildElements().iterator();


// Report to user change in state
SwingUtilities.invokeLater(new Runnable()
{
public void run()
{
reportLabel.setText("Accessing region list");
}
});

// for each Element, download Actual ET image (and later Grass Reference)
while(web_list.hasNext())
{

DomElement region = web_list.next();

System.out.println(region.getTextContent());

HtmlPage page2 = region.click();

// wait for Javascripts jobs to finish
manager = page2.getEnclosingWindow().getJobManager();
for (int i = 0; manager.getJobCount() > 2; i++)
{
final int j = i;
// Report to user
SwingUtilities.invokeLater(new Runnable()
{
public void run()
{
reportLabel.setText("Loading Image Type page: " + j);
}
});
System.out.println(manager.getJobCount());
Thread.sleep(1000);
}

// Get the Actual ET download Button
HtmlButton ETButton = page2.getHtmlElementById("eta_download");

// Get the Download Page????
HtmlPage page3 = ETButton.click();
UnexpectedPage download_ET = new UnexpectedPage(page3.getWebResponse(), page3.getEnclosingWindow());

// Get the Stream
GZIPInputStream in_ET = (GZIPInputStream) download_ET.getWebResponse().getContentAsStream();

// Try writing the stream (to standard out for now)
try
{
byte[] buffer = new byte[2048];

int len;
while((len = in_ET.read(buffer)) != -1)
{
System.out.write(buffer, 0, len);
}
}
finally
{
// Close the stream
in_ET.close();
}
// just do one till this works
break;
}
}

最佳答案

这是一个好的开始:)我查看了单击按钮时发送的请求:

Post request when clicking the button

正如您所看到的,发送了几个参数(纬度、经度、日期结束时间、图像 ID)。在响应中,您有下载 URL。这个请求是用一些 Javascript 代码生成的,可能是这样的:

function downloadImage(divName,urlProduct){
$(document).ready(function(){
$(divName).on('click', function(){
onlyshowLoading();
$.ajax({
url: urlProduct,
type: "POST",
data: JSON.stringify({
"lat": $('#latitude').val(),
"lng": $('#longitude').val(),
"date_info": $('#date_start').val() + ' to ' + $('#date_end').val(),
'image_id': $("#dropdown:first-child").text().split(" / ")[1],
}),
dataType: 'json',
cache: true,
error: function(){
AjaxOnError();
},
success: function(data){
AjaxOnSuccess();
if (typeof ETa_adjusted == "undefined" || ETa_adjusted == null){
$("#ETrF_adjusted").hide();
$("#EToF_adjusted").hide();
$("#ETa_adjusted").hide();
$("#etrF_adj_download").hide();
$("#etoF_adj_download").hide();
$("#eta_adj_download").hide();
} else{
$("#ETrF_adjusted").show();
$("#EToF_adjusted").show();
$("#ETa_adjusted").show();
$("#etrF_adj_download").show();
$("#etoF_adj_download").show();
$("#eta_adj_download").show();

}
var key = Object.keys(data);
typeName = data[key]
window.open(typeName.url, '_blank');
}
});
});
})

}

因此,由于 Jquery 或其他原因,HtmlUnit 可能无法执行此代码。您可以创建自己的 WebRequest 对象,并重现 Javascript 逻辑,然后您将获得下载 URL。

这是一个有趣的主题,如果您想了解更多信息,我正在编写一本关于使用 Java 进行网页抓取的电子书:Java Web Scraping Handbook

关于java - 使用 HtmlUnit 下载文件。下载按钮没有(看似)可访问的链接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50337863/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com