gpt4 book ai didi

ruby - 从 ruby​​ 站点检索帖子数据

转载 作者:行者123 更新时间:2023-12-04 16:20:34 25 4
gpt4 key购买 nike

我尝试从站点检索 POST 数据并尝试多次/与 nokogiri、uri、mechanize 结合使用,但我只从 get 请求中检索数据。我没有看到来自对我感兴趣的 div 的内容。

以下是从该站点获取的正文。我正在寻找内容 div id="list2"。有用户和他们的电话号码表。

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="Description" content="Wyszukiwarka" />
<meta name="Author" content="LR" />
<title>Tel</title>
<link href="styleblue.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="includes/scripts.js"></script>
<script type="text/javascript" src="includes/jquery-1.6.1.min.js"></script>
<script type="text/javascript" src="includes/jquery.form.js"></script>
<link rel="stylesheet" type="text/css" href="img/themes/blue/style.css" />
<link rel="stylesheet" type="text/css" href="img/themes/ui/smoothness/jquery-ui-1.8.13.custom.css" media="screen"/>
<script type="text/javascript" src="includes/jquery-ui-1.8.13.custom.min.js"></script>
<script type="text/javascript" src="includes/ui.datepicker-pl.js"></script>

<script type="text/javascript">
$(document).ready(function(){
gridReloadTel();
})
</script></head>
<body><table style="width: 100%; margin: 0px; padding: 0px; vertical-align:top" cellpadding="0" cellspacing="0">
<tr class="hideen">
<td style="width: 100%"><table cellpadding="0" cellspacing="0" style="width:100%; margin:0px; padding:0px;">
<tr>
<td id="top_left_login" style="height: 101px"></td>
<td style="height: 101px"><img alt="" src="img/top.jpg" /></td>
<td id="top_right_login" style="height: 101px"><div style="position:relative; width:194px; left:-207px; bottom:36px; text-align:right ">Czwartek&nbsp;&nbsp;&nbsp;<span style="color:#FFFFFF;">03-04-2014</span></div></td>
</tr>
</table></td>
</tr>
<tr class="hideen">
<td id="menu"><div >
<img src="img/blue/mline.jpg" border="0" alt="" /><a href="index.php">Wyszukiwarka</a><img src="img/blue/mline.jpg" border="0" alt="" /><a href="aktualizacja.php">Aktualizacja danych</a><img src="img/blue/mline.jpg" border="0" alt="" /><a href="pomoc.php">Pomoc</a><img src="img/blue/mline.jpg" border="0" alt="" />



</div>//Content
</div>
<br /><br />
<div id="list2">I LOOKING FOR THIS DIV</div>

<br />
</div>
<blockquote style="font-size:10px ">
* aktualizacje <br/>
<img src="img/plus.gif" width="18" height="18" />

</blockquote></td>
</tr>
<tr class="hideen">
<td style="width: 100%"><div id="bottom" align="center"><img src="img/bzit.jpg" width="225" height="42" border="0" alt="" /></div></td>
</tr>
</table>
</body>
</html>

当我在 firebug 中检查站点时,我看到 GET url/index.php 和 POST url/grids/search.php。该站点位于本地网络中。当我转到选项卡 XHR 时,POST search.php 在哪里
我懂了
Connection Keep-Alive
Content-Type text/html
Date Thu, 03 Apr 2014 05:31:44 GMT
Keep-Alive timeout=15, max=100
Server Apache
Transfer-Encoding chunked
X-Powered-By PHP/5.2.5
Accept */*
Accept-Encoding gzip, deflate
Accept-Language pl,en-US;q=0.7,en;q=0.3
Cache-Control no-cache
Connection keep-alive
Content-Length 99
Content-Type application/x-www-form-urlencoded; charset=UTF-8
Host url
Pragma no-cache
Referer url/index.php
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:28.0) Gecko/20100101 Firefox/28.0
X-Requested-With XMLHttpRequest

接下来是选项卡响应,我感兴趣的响应
    `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="Description" content="Wyszukiwarka telefonów" />
<meta name="Author" content="LR" />
<title>tel</title>
<link rel="stylesheet" type="text/css" href="/img/themes/blue/style.css" />

</head>
<body>

<div id="contenttable">
<table class="scroll" cellpadding="0" cellspacing="0" width="100%" >
<thead >
<tr>
<td colspan="11">Lista wyników *</td>
</tr>
</thead>


<tbody >
ROWS WITH TELEPHONES
</tbody>

</table>
<table class="scroll" cellpadding="0" cellspacing="0" width="100%" >
<tbody >
</tbody>
<tfoot align="center">
<tr>
<td colspan="11" style="text-align:left"><img src="img/themes/blue/images/first.png" onclick="jQuery('#page').val(1);gridReloadTel()" /> <img src="img/themes/blue/images/prev.png" onclick="jQuery('#page').val(1);gridReloadTel()" />
<input id="page" type="text" value="2" size="3" maxlength="5" onkeydown="doSearchTel(arguments[0]||event)" />
/ 802 <img src="img/themes/blue/images/next.png" onclick="jQuery('#page').val(3);gridReloadTel()" /> <img src="img/themes/blue/images/last.png" onclick="jQuery('#page').val(802);gridReloadTel()" /> | wyświetl
<select id="rows" name="rows" onchange="gridReloadTel()">
<option value="15" selected >15</option>
<option value="25" >25</option>
<option value="50" >50</option>
<option value="200" >200</option>
</select>
| 12016 wierszy</td>
</tr>
</tfoot>
</table>

</div>
<div style="position:absolute; top:140px; right:20px;" class="hideen"><form action="export.php" method="post" target="_blank" id="exportform" name="exportform" >
<a href="javascript:document.exportform.submit();" onmouseout="MM_swapImgRestore()" onmouseover="MM_swapImage('xlsex','','img/xls_down.jpg',1)"><img src="img/xls_up.jpg" name="xlsex" border="0" id="xlsex" title="Wygeneruj spis wyb" /></a>
<input name="sord" type="hidden" value="PRNazwa asc" /><input name="where" type="hidden" value=" 1=1 " />
<input type="hidden" name="start" value="15" />
<input type="hidden" name="limit" value="15" />
</form></div>

<script type="text/javascript">

var _gaq = _gaq || [];
_gaq.push(['_setAccount', '']);
_gaq.push(['_trackPageview']);

(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

</script>
</body></html>`

如何从 div id='contenttable' 检索此数据?
任何答案,想法可能对我非常有帮助。

最佳答案

尝试 Mechanize

@agent = Mechanize.new do |a|
a.user_agent_alias = 'Windows Chrome'
a.log = Logger.new "activity.log"
a.get 'url/index.php'
end

现在,您可以使用
@agent.post('url/grids/search.php', "foo" => "bar", headers go here)

要获取查询参数和 header ,请参阅开发人员工具中的请求 header

关于ruby - 从 ruby​​ 站点检索帖子数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22829001/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com