gpt4 book ai didi

c# - 使用 javascript cookie 和 C# 抓取网站

转载 作者:行者123 更新时间:2023-11-28 10:32:41 24 4
gpt4 key购买 nike

我想从以下网站抓取一些内容:http://www.conrad.nl/modelspoor .

这是我的功能:

public string SreenScrape(string urlBase, string urlPath)
{
CookieContainer cookieContainer = new CookieContainer();
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(urlBase + urlPath);
httpWebRequest.CookieContainer = cookieContainer;
httpWebRequest.UserAgent = "Mozilla/6.0 (Windows; U; Windows NT 7.0; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.9 (.NET CLR 3.5.30729)";
WebResponse webResponse = httpWebRequest.GetResponse();
string result = new System.IO.StreamReader(webResponse.GetResponseStream(), Encoding.Default).ReadToEnd();
webResponse.Close();

if (result.Contains("<frame src="))
{
Regex metaregex = new Regex("http:[a-z:/._0-9!?=A-Z&]*",RegexOptions.Multiline);
result = result.Replace("\r\n", "");
Match m = metaregex.Match(result);
string key = m.Groups[0].Value;

foreach (Match match in metaregex.Matches(result))
{
HttpWebRequest redirectHttpWebRequest = (HttpWebRequest)WebRequest.Create(key);
redirectHttpWebRequest.CookieContainer = cookieContainer;
webResponse = redirectHttpWebRequest.GetResponse();
string redirectResponse = new System.IO.StreamReader(webResponse.GetResponseStream(), Encoding.Default).ReadToEnd();
webResponse.Close();
return redirectResponse;
}

}
return result;
}

但是当我这样做时,我从使用 JavaScript 的网站收到一个带有错误的字符串。

有人知道如何解决这个问题吗?

最佳答案

使用我的博客上的文章 (Use C# to Scrape web pages),我能够获取该页面。这是代码:

string target            = @"http://www1.conrad.nl/modelspoor/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create( target );
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

using ( Stream responseStream = response.GetResponseStream() )
using ( StreamReader htmlStream = new StreamReader( responseStream, Encoding.UTF8 ) )
Console.WriteLine( htmlStream.ReadToEnd() );

HTH

关于c# - 使用 javascript cookie 和 C# 抓取网站,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2610526/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com