gpt4 book ai didi

c#-4.0 - 递归链接抓取器 C#

转载 作者:行者123 更新时间:2023-12-02 08:21:36 26 4
gpt4 key购买 nike

我一整天都在为这个问题苦苦挣扎,但我似乎无法弄清楚。我有一个函数可以为我提供特定网址上所有链接的列表。效果很好。不过,我想让这个函数递归,以便它搜索第一次搜索找到的链接,并将它们添加到列表中并继续,以便它遍历网站上的所有页面。我怎样才能使这个递归?

我的代码:

class Program
{
public static List<LinkItem> urls;
private static List<LinkItem> newUrls = new List<LinkItem>();

static void Main(string[] args)
{
WebClient w = new WebClient();
int count = 0;
urls = new List<LinkItem>();
newUrls = new List<LinkItem>();
urls.Add(new LinkItem{Href = "http://www.smartphoto.be", Text = ""});

while (urls.Count > 0)
{
foreach (var url in urls)
{
if (RemoteFileExists(url.Href))
{
string s = w.DownloadString(url.Href);
newUrls.AddRange(LinkFinder.Find(s));
}
}
urls = newUrls.Select(x => new LinkItem{Href = x.Href, Text=""}).ToList();
count += newUrls.Count;
newUrls.Clear();
ReturnLinks();
}

Console.WriteLine();
Console.Write("Found: " + count + " links.");
Console.ReadLine();
}

private static void ReturnLinks()
{
foreach (LinkItem i in urls)
{
Console.WriteLine(i.Href);
//ReturnLinks();
}
}

private static bool RemoteFileExists(string url)
{
try
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "HEAD";
//Getting the Web Response.
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
//Returns TURE if the Status code == 200
return (response.StatusCode == HttpStatusCode.OK);
}
catch
{
return false;
}
}
}

LinkFinder.Find 背后的代码可以在这里找到:http://www.dotnetperls.com/scraping-html

有人知道如何使该函数递归或者使 ReturnLinks 函数递归吗?我宁愿不碰 LinkFinder.Find 方法,因为它非常适合一个链接,我应该能够根据需要多次调用它来扩展我的最终 URL 列表。

最佳答案

我假设您想要加载每个链接并找到其中的链接,然后继续,直到用完链接为止?

由于递归深度可能会变得非常大,我会避免递归,我认为这应该可行。

WebClient w = new WebClient();
int count = 0;
urls = new List<string>();
newUrls = new List<LinkItem>();
urls.Add("http://www.google.be");

while (urls.Count > 0)
{
foreach(var url in urls)
{
string s = w.DownloadString(url);
newUrls.AddRange(LinkFinder.Find(s));
}
urls = newUrls.Select(x=>x.Href).ToList();
count += newUrls.Count;
newUrls.Clear();
ReturnLinks();
}

Console.WriteLine();
Console.Write("Found: " + count + " links.");
Console.ReadLine();

关于c#-4.0 - 递归链接抓取器 C#,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6880093/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com