gpt4 book ai didi

java - 无法抓取 HTML 网站?

转载 作者:行者123 更新时间:2023-12-01 05:05:18 24 4
gpt4 key购买 nike

所以我试图让我的应用程序访问网站,从该网站获取 HTML,从 HTML 中删除不必要的元素,然后在我的临时应用程序中加载“内容”,因为我不有 API 或 Feed。我正在使用 Jsoup,如果我不在 android 中进行网页抓取,它就可以工作,但 android 不喜欢它。

public class SimpleDiggActivity extends Activity {

private WebView browser;
final Activity activity = this;

@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
this.getWindow().requestFeature(Window.FEATURE_PROGRESS);

setContentView(R.layout.main);

getWindow().setFeatureInt(Window.FEATURE_PROGRESS, Window.PROGRESS_VISIBILITY_ON);

String url = "http://www.digg.com";
Document digg;
browser = (WebView) findViewById(R.id.mybrowser);
final Button homeDigg = (Button) findViewById(R.id.button1);

browser.setWebViewClient(new SimpleWebViewClient());

browser.getSettings().setJavaScriptEnabled(true);
browser.getSettings().setUseWideViewPort(true);
browser.getSettings().setLoadWithOverviewMode(true);
browser.getSettings().setDisplayZoomControls(false);
browser.getSettings().setEnableSmoothTransition(true);
browser.getSettings().setBuiltInZoomControls(true);
browser.getSettings().setUserAgentString("Android");

// progressCircle = ProgressDialog.show(SimpleDiggActivity.this, "", "Loading...");
final ProgressDialog progressCircle = new ProgressDialog(activity);
progressCircle.setProgressStyle(ProgressDialog.STYLE_SPINNER);
progressCircle.setMessage("Loading...");
progressCircle.setCancelable(false);

try{
Toast.makeText(getApplicationContext(), "No Steps down", Toast.LENGTH_SHORT).show();
Document diggTest = Jsoup.connect("http://digg.com/enable/mobile").get();
Toast.makeText(getApplicationContext(), "1 Steps down", Toast.LENGTH_SHORT).show();
String diggTitle = diggTest.title();
Toast.makeText(getApplicationContext(), "2 Steps down" , Toast.LENGTH_SHORT).show();
Document compressed = Jsoup.parseBodyFragment(diggTitle);
Toast.makeText(getApplicationContext(), "3 Steps down", Toast.LENGTH_SHORT).show();
org.jsoup.select.Elements div = diggTest.select("div");
Toast.makeText(getApplicationContext(), "4 Steps down", Toast.LENGTH_SHORT).show();
String divBrow = div.toString();
Toast.makeText(getApplicationContext(), "5 Steps down", Toast.LENGTH_SHORT).show();
browser.loadUrl(divBrow);
}catch (Exception e){
e.printStackTrace();

Toast.makeText(getApplicationContext(), "Gave up", Toast.LENGTH_SHORT).show();
String diggBrow = url;
browser.loadUrl("http://www.google.com");
}

抱歉,如果有点乱,我只是乱搞,这是我第一次。 Toast 是为了让我告诉代码何时尝试失败并诉诸 catch。当我运行它时,它并没有过去

 Document diggTest = Jsoup.connect("http://digg.com/enable/mobile").get();

最佳答案

我用 JSOUP 版本 1.7.1 尝试了你的代码,它在我这边工作正常。以下是工作代码:

public class SimpleDiggActivity extends Activity {

final Activity activity = this;

@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
this.getWindow().requestFeature(Window.FEATURE_PROGRESS);

setContentView(R.layout.activity_simple_digg);

getWindow().setFeatureInt(Window.FEATURE_PROGRESS,
Window.PROGRESS_VISIBILITY_ON);

String url = "http://www.digg.com";
Document digg;

// progressCircle = ProgressDialog.show(SimpleDiggActivity.this, "",
// "Loading...");
final ProgressDialog progressCircle = new ProgressDialog(activity);
progressCircle.setProgressStyle(ProgressDialog.STYLE_SPINNER);
progressCircle.setMessage("Loading...");
progressCircle.setCancelable(false);

try {
Toast.makeText(getApplicationContext(), "No Steps down",
Toast.LENGTH_SHORT).show();
Document diggTest = Jsoup.connect("http://digg.com/enable/mobile")
.get();
Toast.makeText(getApplicationContext(), "1 Steps down",
Toast.LENGTH_SHORT).show();
String diggTitle = diggTest.title();
Toast.makeText(getApplicationContext(), "2 Steps down",
Toast.LENGTH_SHORT).show();
Document compressed = Jsoup.parseBodyFragment(diggTitle);
Toast.makeText(getApplicationContext(), "3 Steps down",
Toast.LENGTH_SHORT).show();
org.jsoup.select.Elements div = diggTest.select("div");
Toast.makeText(getApplicationContext(), "4 Steps down",
Toast.LENGTH_SHORT).show();
String divBrow = div.toString();
Toast.makeText(getApplicationContext(), "5 Steps down",
Toast.LENGTH_SHORT).show();
Log.d(this.getClass().getSimpleName(), "data is " + divBrow);
} catch (Exception e) {
e.printStackTrace();

Toast.makeText(getApplicationContext(), "Gave up",
Toast.LENGTH_SHORT).show();
String diggBrow = url;
}
}
}

以下是 divBrow 的值:

10-10 11:58:45.631: D/SimpleDiggActivity(350): data is <div class="site-header-container page-container"> 
10-10 11:58:45.631: D/SimpleDiggActivity(350): <header class="site-header">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <h1 class="site-header-logo-container"><a href="/" id="site-header-logo" class="image-replace">Digg</a></h1>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </header>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div id="container" class="page-container">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <ul id="top-stories">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-container story-1" data-content-id="Racz8K" id="story-Racz8K">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-details">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-kicker">
10-10 11:58:45.631: D/SimpleDiggActivity(350): NO FILTER
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-headline">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <a data-position="0" class="story-link" href="http://www.fastcompany.com/3001994/no-filter-inside-hipstamatics-lost-year-searching-next-killer-social-app" data-content-id="Racz8K"> Inside Hipstamatic’s Lost Year Searching For The Next Killer Social&nbsp;App </a>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-domain">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-link-wrapper">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <a data-position="0" class="story-link" href="http://www.fastcompany.com/3001994/no-filter-inside-hipstamatics-lost-year-searching-next-killer-social-app" data-content-id="Racz8K">fastcompany.com</a>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-actions">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <span class="story-action-item story-score"> <span class="story-score-details">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <ul class="story-score-details-list">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-score-thumb-Racz8K story-score-thumb">20</li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-score-tweets-Racz8K story-score-twitter">402</li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-score-fb_shares-Racz8K story-score-facebook">72</li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </ul> </span> <span class="story-score-Racz8K">494</span> </span>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-image">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <a data-position="0" class="story-link" href="http://www.fastcompany.com/3001994/no-filter-inside-hipstamatics-lost-year-searching-next-killer-social-app" data-content-id="Racz8K"><img src="http://static.digg.com/images/Racz8K_1_www_large_thumb.jpeg" alt="" width="312" height="170" /></a>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-preview">
10-10 11:58:45.631: D/SimpleDiggActivity(350): From rooftop bashes and acquisition talks to staff clashes and layoffs, Hipstamatic’s founders and ex-employees describe the startup’s losing struggle to keep pace with Instagram, Facebook, and others in the white-hot photo-sharing space.
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div> </li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-container story-1" data-content-id="Qa2sP3" id="story-Qa2sP3">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-details">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-kicker">
10-10 11:58:45.631: D/SimpleDiggActivity(350): PHOTOGRAPHY
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-headline">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <a data-position="1" class="story-link" href="http://lens.blogs.nytimes.com/2012/10/09/looking-into-the-eyes-of-made-in-china/" data-content-id="Qa2sP3"> Looking Into The Eyes Of 'Made In&nbsp;China' </a>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-domain">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-link-wrapper">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <a data-position="1" class="story-link" href="http://lens.blogs.nytimes.com/2012/10/09/looking-into-the-eyes-of-made-in-china/" data-content-id="Qa2sP3">lens.blogs.nytimes.com</a>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-actions">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <span class="story-action-item story-score"> <span class="story-score-details">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <ul class="story-score-details-list">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-score-thumb-Qa2sP3 story-score-thumb">0</li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-score-tweets-Qa2sP3 story-score-twitter">252</li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <li class="story-score-fb_shares-Qa2sP3 story-score-facebook">411</li>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </ul> </span> <span class="story-score-Qa2sP3">663</span> </span>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-image">
10-10 11:58:45.631: D/SimpleDiggActivity(350): <a data-position="1" class="story-link" href="http://lens.blogs.nytimes.com/2012/10/09/looking-into-the-eyes-of-made-in-china/" data-content-id="Qa2sP3"><img src="http://static.digg.com/images/Qa2sP3_1_www_large_thumb.jpeg" alt="" width="312" height="170" /></a>
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>
10-10 11:58:45.631: D/SimpleDiggActivity(350): <div class="story-preview">
10-10 11:58:45.631: D/SimpleDiggActivity(350): In “Faces of Made in China,” a series of typological portraits looking at workers inside six Chinese factories, the photographer Lucas Schifres seeks to consider the otherwise anonymous people who produce our essential possessions by looking directly into their eyes.
10-10 11:58:45.631: D/SimpleDiggActivity(350): </div>

请在最后尝试一下并让我知道效果如何。

关于java - 无法抓取 HTML 网站?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12792155/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com