gpt4 book ai didi

c# - 在使用分页和 JavaScript 链接时,如何从 ASP.NET 网站上抓取信息?

转载 作者:太空狗 更新时间:2023-10-29 21:47:46 25 4
gpt4 key购买 nike

我得到了一份应该是最新的员工名单,但它与用 ASP.NET 编写的 Intranet People Finder 不匹配。

由于信息很敏感,我无法访问 People Finder 使用的数据库,所以我获取信息的唯一方法是从最顶层的高层开始抓取结构,然后遍历每一层反过来。

每个人都有一个员工编号,然后形成 URL http://intranet/peoplefinder/index.aspx?srn=ABC1234然后所有向他们报告的人都以 <a id="gvEmployees_ctl03_lnkFullName" href="index.aspx?srn=ABC4321" target="_self"> 的格式列在下面其中每个 URL 都指示员工编号并提供指向其团队的链接。

当团队规模很大时,问题就出现了,因为分页是在 GridView 中实现的,其 URL 类似于 <a href="javascript:__doPostBack('gvEmployees','Page$2')">2</a>。 .

我将如何抓取此页面,捕获 SRN 和其他详细信息以及向 GridView 所有页面上的人员报告的人员,然后遍历每个报告者并执行相同的过程,直到整个列表完成?

结果的示例 HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" >
<head><title>
People Finder: Name Surname
</title><link rel="stylesheet" href="/path/to/style.css" type="text/css" /><link rel="stylesheet" href="/path/to/anotherStyle.css" type="text/css" />
<script type="text/javascript" src="/path/to/peoplefinder.js"></script>
</head>
<body>
<form name="form1" method="post" action="/path/to/index.aspx" id="form1">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="### ViewState ###" />
</div>

<script type="text/javascript">
<!--
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
// -->
</script>


<script src="/path/to/WebResource.axd?d=AueXWrgAf8xSxMTAt1Q4AA2&amp;t=633311832634916698" type="text/javascript"></script>

<div class="HP3CHeader">
<div id="LWHPBanner">
<h1><span id="lblName">Name Surname</span></h1>
</div>
</div>

<div id='CPMain'>
<div id="mainBox">

<div id="pnlEmployeeDetails">

<div id='basicData'>
<img id="imgPhoto" class="photo" src="/path/to/photo.jpg" style="height:69px;width:69px;border-width:0px;" />
<span id="lblBusinessUnit">Business Unit</span>
<span id="lblCostCentreName">Cost Centre</span>
<span id="lblLocation">Location</span>

<a href='/path/to/checkcontactdetails.htm' target='_blank' onclick='return OpenCheckContactDetails();' >Find out how to change your details/photo.</a>
<div id="manager">
<strong>Reports to: </strong><a id="hlManager" href="/path/to/index.aspx?srn=ABC1234">Name Surname</a>
</div>
</div>

<div id='contactData'>

<div id="pnlSrn">
<strong>Staff number:</strong> <span id="lblSrn">ABC1234</span>
</div>


<div id="pnlEmailAddress">
<strong>Email Address:</strong> <span id="lblEmailAddress">Email</span>
</div>
<div style="clear: both"></div>
</div>

</div>

<div id="pnlGrid">

<h3><span id="lblGridTitle">Name's team</span></h3>
<div>
<table class="subordinates" cellspacing="0" cellpadding="2" rules="cols" border="1" id="gvEmployees" style="border-style:None;border-collapse:collapse;">
<tr style="color:Black;background-color:#EFF3FB;border-style:None;font-weight:bold;">
<th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$SRN')" style="color:Black;">SRN</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$FullName')" style="color:Black;">Full name</a></th><th scope="col"><a href="javascript:__doPostBack('gvEmployees','Sort$RACFID')" style="color:Black;">RACFID</a></th>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl02_lnkFullName" href="index.aspx?srn=1K5932" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl03_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl04_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl05_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl06_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl07_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl08_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl09_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:White;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl10_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="reports" style="background-color:#EFF3FB;border-style:None;">
<td style="width:70px;">ABC1234</td><td>
<a id="gvEmployees_ctl11_lnkFullName" href="/path/to/index.aspx?srn=ABC1234" target="_self">Name Surname</a>
</td><td>ABCD</td>
</tr><tr class="PagerStyle" style="color:#000039;border-style:None;">
<td colspan="3"><table border="0">
<tr>
<td><span>1</span></td><td><a href="javascript:__doPostBack('gvEmployees','Page$2')" style="color:#000039;">2</a></td>
</tr>
</table></td>
</tr>
</table>
</div>

</div>
</div>

<div id="searchBox">
<strong>Search People Finder:</strong>
<br /><br />
<span>Forename:</span><br/>
<span><input name="txtFirstname" type="text" id="txtFirstname" /></span><br/>
<span>Surname:</span><br/>
<span><input name="txtSurname" type="text" id="txtSurname" /></span><br/>
<span>RACFID:</span><br/>
<span><input name="txtRacfid" type="text" id="txtRacfid" /></span><br/>
<span>Staff number:</span><br/>
<span><input name="txtSrn" type="text" id="txtSrn" /></span><br/>
<div class="searchBoxItem" style="text-align:center;width:100%"><input type="submit" name="btnFind" value="Search" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;btnFind&quot;, &quot;&quot;, false, &quot;&quot;, &quot;index.aspx&quot;, false, false))" id="btnFind" title="Search for employees member" class="button" style="border-style:Outset;" /></div><br/>
<div>People Finder searches only UK staff.</div>
<!-- <div><a class="execBoardLink" href="/path/to/index.aspx?srn=ABC1234">Show Executive Board</a></div> -->
<div style="margin-top:5px;"><a href="/path/to/phonebook" target="phoneBook" onclick='return OpenPhonebook();' title="Open Phonebook in new window">Open Phonebook</a></div>
</div>
</div>

<div class="contentFooter" style="text-align:center;">
<table width="100%" cellpadding="0" cellspacing="0" border="0" summary="Navigation layout table">
<tr>
<td align="left"><span class="linkArrow">&lt;</span> <a href="javascript:history.back();">Back</a></td>
<td align="center"></td>
<td align="right"><span class="linkArrow">^ </span><a href="#top">Top</a></td>
</tr>
</table>
</div>

<div>

<input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="vy066Txz34y1E515UsTSTDabHKEmdBRCsq7xM0lpJls1" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWCgKM3uTTAgLP/83pDwLfwaTTAQKNguzjCAKt98LeCwLZh62pDwKKqdGpBwLd2q7jAwKa+5aMBAL5zb65C42zY4GBEUKujhjtZ/hZ8sLESfiF" />
</div></form>
</body>
</html>

最佳答案

您可以将变量发布到 HTML 页面以进行分页。

string lcUrl = "http://www.mysite.com/page.aspx";

HttpWebRequest loHttp =

(HttpWebRequest) WebRequest.Create(lcUrl);


// *** Send any POST data

string lcPostData =

"gvEmployees=" + HttpUtility.UrlEncode("Page$2");

loHttp.Method="POST";

byte [] lbPostBuffer = System.Text.

Encoding.GetEncoding(1252).GetBytes(lcPostData);

loHttp.ContentLength = lbPostBuffer.Length;

Stream loPostData = loHttp.GetRequestStream();

loPostData.Write(lbPostBuffer,0,lbPostBuffer.Length);

loPostData.Close();

HttpWebResponse loWebResponse = (HttpWebResponse) loHttp.GetResponse();

Encoding enc = System.Text.Encoding.GetEncoding(1252);

StreamReader loResponseStream =

new StreamReader(loWebResponse.GetResponseStream(),enc);

string lcHtml = loResponseStream.ReadToEnd();

loWebResponse.Close();

loResponseStream.Close();

然后从字符串中解析出你需要的数据。

--编辑--

这是我将尝试(类似的)发送所有帖子数据的方法:

string lcPostData =

"__EVENTTARGET" + HttpUtility.UrlEncode("gvEmployees"); &
"__EVENTARGUMENT" + HttpUtility.UrlEncode("Page%242"); &
"__VIEWSTATE" + HttpUtility.UrlEncode("<Value of _Viewstate>");

关于c# - 在使用分页和 JavaScript 链接时,如何从 ASP.NET 网站上抓取信息?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2449328/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com