gpt4 book ai didi

java - 站长工具Api,获取超过1000个爬行错误

转载 作者:太空宇宙 更新时间:2023-11-04 06:19:34 25 4
gpt4 key购买 nike

我正在使用新的网站管理员工具 API 来获取我网站的所有抓取错误(+ 详细信息)。不舒服。它只给了我 1000,但我有大约 10000。有没有办法获得所有这些?

这是我使用的代码:

package main;

import com.google.api.client.googleapis.auth.oauth2.GoogleAuthorizationCodeFlow;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.googleapis.auth.oauth2.GoogleTokenResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;

import com.google.api.services.webmasters.Webmasters;
import com.google.api.services.webmasters.Webmasters.Urlcrawlerrorssamples;
import com.google.api.services.webmasters.model.SitesListResponse;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSample;
import com.google.api.services.webmasters.model.UrlCrawlErrorsSamplesListResponse;
import com.google.api.services.webmasters.model.WmxSite;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;


public class WebmastersCommandLine {

private static String CLIENT_ID = "...";
private static String CLIENT_SECRET = "...";

private static String REDIRECT_URI = "urn:ietf:wg:oauth:2.0:oob";

private static String OAUTH_SCOPE = "https://www.googleapis.com/auth/webmasters.readonly";

private static String PAGE_URL = "...";

public static void main(String[] args) throws IOException {
HttpTransport httpTransport = new NetHttpTransport();
JsonFactory jsonFactory = new JacksonFactory();

GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(
httpTransport, jsonFactory, CLIENT_ID, CLIENT_SECRET, Arrays.asList(OAUTH_SCOPE))
.setAccessType("online")
.setApprovalPrompt("auto").build();

String url = flow.newAuthorizationUrl().setRedirectUri(REDIRECT_URI).build();
System.out.println("open URL:");
System.out.println(" " + url);
System.out.println("code:");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String code = br.readLine();

GoogleTokenResponse response = flow.newTokenRequest(code).setRedirectUri(REDIRECT_URI).execute();
GoogleCredential credential = new GoogleCredential().setFromTokenResponse(response);

// Create a new authorized API client
Webmasters service = new Webmasters.Builder(httpTransport, jsonFactory, credential)
.setApplicationName("WebmastersCommandLine")
.build();

Webmasters.Urlcrawlerrorssamples.List req2 = service.urlcrawlerrorssamples().list(PAGE_URL, "notFound", "web");

try
{
UrlCrawlErrorsSamplesListResponse urlList = req2.execute();

System.out.println("start");

for(UrlCrawlErrorsSample sample : urlList.getUrlCrawlErrorSample())
{
Webmasters.Urlcrawlerrorssamples.Get req3 = service.urlcrawlerrorssamples().get(PAGE_URL, sample.getPageUrl(), "notFound", "web");
UrlCrawlErrorsSample details = req3.execute();

System.out.println(sample.getPageUrl() + "," + details.getUrlDetails().getLinkedFromUrls());
}

}
catch(IOException e)
{
System.out.println("An error occurred: " + e);
}

System.out.println("done");
}

}

然而,这只给了我 1000 个错误的列表,但我需要全部 10000 个错误。有人知道如何做到这一点吗?

最佳答案

网站管理员工具 API URL Crawl Errors Sample method返回 1000 个爬网错误的样本。它并不意味着返回完整的列表(您可以从服务器日志中编译该列表)。如果您想通过 API 获取更多示例,您可以做的一件事是 mark these errors as fixed并在一天后回来查看。然后,它将根据剩余的爬行错误生成一组样本。

示例的顺序与 UI 中的顺序相同,因此您最先看到的将是更重要的示例。这意味着随着您继续前进,返回会递减,后来的抓取错误要么与之前的错误相似,要么至少被视为不那么严重。原文blog post有关优先级的更多信息:

We determine this based on a multitude of factors, including whether or not you included the URL in a Sitemap, how many places it’s linked from (and if any of those are also on your site), and whether the URL has gotten any traffic recently from search.

关于java - 站长工具Api,获取超过1000个爬行错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27590319/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com