我正在尝试下载此网站的 csv 文件,该文件很小,使用任何浏览器下载只需 2 秒。
http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download
使用 HttpWebRequest 和 WebClient,但看起来 nasdaq.com 不允许数据通过这两种方法流动,我也尝试使用 Fiddler,但没有任何结果。我只能使用任何浏览器下载此数据。
我尝试更改 header 、代理、安全协议(protocol)、重定向、一些 cookie 和许多设置,但我仍然遇到这个问题。
如果有人对如何让它工作有任何想法,请告诉我,如果您有解决方案,请只回复此帖子。谢谢。
C# .Net Framework 4.5+ 中的以下代码
下面的代码可以下载其他网站,但不能下载nasdaq.com网站。
static void Main(string[] args)
{
try
{
string testUrl = "https://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download";
HttpWebRequestTestDownload(testUrl);
}catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
}
public static void HttpWebRequestTestDownload(string address)
{
//Example from
//https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.getresponse(v=vs.110).aspx
System.Net.HttpWebRequest wReq = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(address);
wReq.KeepAlive = false;
System.Net.ServicePointManager.SecurityProtocol = System.Net.SecurityProtocolType.Ssl3;
ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
//I also tried the below and still not working
//wReq.AllowAutoRedirect = true;
//wReq.KeepAlive = false;
//wReq.Timeout = 10 * 60 * 1000;//10 minutes
////Accept-Encoding
//wReq.Accept = "application/csv,application/json,text/csv,text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
////Request format text/html. Will improve this if nessary Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
////http://www.useragentstring.com/
//wReq.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36";
//wReq.ProtocolVersion = HttpVersion.Version11;
//// wReq.Headers.Add("Accept-Language", "en_eg");
//wReq.ServicePoint.Expect100Continue = false;
////Fixing invalid SSL problem
//System.Net.ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
////Fixing the underlying connection was closed: An unexpected error occurred on a send for Framework 4.5 or higher
//ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
//wReq.Headers.Add("Accept-Encoding", "gzip, deflate");//Accept encoding
// Set some reasonable limits on resources used by this request
wReq.MaximumAutomaticRedirections = 4;
wReq.MaximumResponseHeadersLength = 4;
// Set credentials to use for this request.
wReq.Credentials = System.Net.CredentialCache.DefaultCredentials;
System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)wReq.GetResponse();
Console.WriteLine("Content length is {0}", response.ContentLength);
Console.WriteLine("Content type is {0}", response.ContentType);
// Get the stream associated with the response.
System.IO.Stream receiveStream = response.GetResponseStream();
// Pipes the stream to a higher level stream reader with the required encoding format.
System.IO.StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8);
Console.WriteLine("Response stream received.");
Console.WriteLine(readStream.ReadToEnd());
response.Close();
readStream.Close();
}
public static void WebClientTestDownload(string address)
{
System.Net.WebClient client = new System.Net.WebClient();
string reply = client.DownloadString(address);
}
我能够解决问题。给大家的小技巧,使用fiddler抓网,使用同一个header。它在我拥有该网站所需的所有 header 后起作用。
using (WebClient web = new WebClient())
{
web.Headers[HttpRequestHeader.Host] = "www.nasdaq.com"
web.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8";
web.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
web.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Mobile Safari/537.36";
string reply = web.DownloadString(url).;
}
我是一名优秀的程序员,十分优秀!