gpt4 book ai didi

python - 使用选择器收集某些搜索的值

转载 作者:太空宇宙 更新时间:2023-11-03 14:52:08 24 4
gpt4 key购买 nike

运行用 python 编写的脚本,我可以完美地获取名称。但是,如果是电话和地址,我会得到“ph”。和“电子邮件”作为结果,如下所示,而不是它的值。我怎样才能得到“ph”的值。和使用选择器的“电子邮件”。

我得到的结果是:

arkLAB Architecture Ph. Email
Conrad Gargett Ph. Email
MONDO ARCHITECTS Ph. Email

我试图获取结果的脚本:

import requests 
from lxml import html

main_url = "http://www.findanarchitect.com.au/index.php"

def get_content(link):

payload = {'action':'show_search_result','action_spam':'dDfgEr','txtSearchType':5,'txtPracName':'','optSstate':3,'optRegions':23,'txtPcode':'','txtShowBuildingType':0,'optBuildingType':1,'optHomeType':1,'optBudget':''}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36'}
tree = html.fromstring(requests.post(link, data = payload, headers = headers).text)

for title in tree.cssselect("div#searchresultaplus"):
names = title.cssselect("h2")[0].text
phone = title.cssselect("div p > strong:contains('ph.')")[0].text
email = title.cssselect("div p > strong:contains('Email')")[0].text
print(names, phone, email)

get_content(main_url)

值所在的元素:

<div id="searchresultsapluscont">    
<h2>Hugh Gordon Architect P/L</h2>
<div id="archdetails">
<div style="float:left">
<p>
Unit 5/6 Lonsdale Street <br>
BRADDON ACT 2612
</p>
<p>
<strong>Ph.</strong> 02 6253 4448<br>
<strong>Email</strong> info@hughgordon.com.au
</p>
</div>
<div style="float:right" class="yogi_v"><div class="img_box">
<img src="/img/aplusprofile.png" alt="aplus logo">
</div></div>
<div class="clearboth">
<div><img src="/img/fe_img/resultline.png"></div>
<p><br>Our company has been designing homes, apartments &amp; townhouses for the past two decades in the A.C.T. and N.S.W. This experience has allowed us to become a leading architecture firm, with great focus on the Multi-Residential sector. Due to our diverse team of designers, town planners, lawyers and Architects we are able to provide sophisticated and complex design solutions for all sectors of the Built Environment. With our head office based in Canberra, A.C.T. we are centrally located and conveniently placed to service both the Sydney, South Coast and Victorian regions.</p></div>

</div>
<div style="float:right">
<a href="javascript:void(0);" onclick="js_show_profile('3796')"><img src="/files/profile_img/3796/4342_4_preview.jpg" alt="Feature Image"></a>
</div>
<div class="clearboth">
<div style="float:left;"><input type="image" src="/img/fe_img/btn_profileaplus.png" value="View profile" onclick="return js_show_profile('3796')" class="nopad">&nbsp;&nbsp;&nbsp;</div>
<div style="float:left;"><input type="image" src="/img/fe_img/btn_awardsaplus.png" value="Awards" onclick="return js_show_awards('3796')" class="nopad">&nbsp;&nbsp;&nbsp;</div>
<div id="idFavBtn_3796" style="padding-top:1px;"><a href="javascript: void(0)" onclick="js_addto_fav('3796','Hugh Gordon Architect P/L','1')"><img src="/img/addtofavaplus.png"></a></div>
</div>
</div>

顺便说一句,我不想​​在这里使用 xpath。提前致谢。

最佳答案

使用tail属性。它包含直接跟随该元素的文本,直到下一个元素。

names = title.cssselect("h2")[0].text
phone = title.cssselect("div p > strong:contains('ph.')")[0].tail.strip()
email = title.cssselect("div p > strong:contains('Email')")[0].tail.strip()

关于python - 使用选择器收集某些搜索的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45827687/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com