gpt4 book ai didi

python - Selenium webdriver 链接提取

转载 作者:太空宇宙 更新时间:2023-11-03 16:43:52 24 4
gpt4 key购买 nike

我的 HTML 源代码为

   <ul class="content">
<li class="">
<div class="profile-card">
<div class="content">
<a href="https://www.linkedin.com/in/ouafae-ezzine-894b113">
Ouafae Ezzine
</a>
<p class="headline">
Organise vos evenements professionnels &amp; personnels
</p>
<dl class="basic">
<dt>
Location
</dt>
<dd>
France
</dd>
<dt>
Industry
</dt>
</dl>
<table class="expanded hide-mobile">
<tbody>
<tr>
<th>
Current
</th>
<td>
Responsable at Blue Med Events
</td>
</tr>
<tr>
<th>
Past
</th>
<td>
Administrateur achats at Pfizer
</td>
</tr>
<tr>
<th>
Education
</th>
<td>
Universite d'Evry Val d'Essonne
</td>
</tr>
<tr>
<th>
Summary
</th>
<td>
Riche d'une experience de plus de 25 ans dans le domaine de l'organisation evenementielle, je mets mon expertise...
</td>
</tr>
</tbody>
</table>
</div>
</div>
</li>
<li class="">
<div class="profile-card">
<div class="content">
<h3>
<a href="https://www.linkedin.com/in/ouafae-ezzine-892855b6">
Ouafae Ezzine
</a>
</h3>
<p class="headline">
Gerante
</p>
<dl class="basic">
<dt>
Location
</dt>
<dd>
France
</dd>
<dt>
Industry
</dt>
<dd>
Events Services
</dd>
</dl>
<table class="expanded hide-mobile">
<tbody>
<tr>
<th>
Current
</th>
<td>
Gerante
</td>
</tr>
</tbody>
</table>
</div>
</div>
</li>
</ul>

我编写了一个 python 代码,它将查找页面中是否存在给定的字符串。

如果字符串与该配置文件( anchor 标记)关联,我正在尝试编写逻辑来提取与特定配置文件关联的 anchor 链接。

我的Python片段:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('file:///nfs/users/lpediredla/Documents/linkedin/Top2profLinkedIn.html')

ids = driver.find_elements_by_xpath("//*[contains(text(), 'Organise vos evenements professionnels')]")

#don't know how to associate the element with the profile
#please help with the logic here.


driver.close()

此时,我在尝试将元素与其所在的配置文件存储桶关联起来时感到惊讶。

非常感谢任何帮助。

最佳答案

你想要的是preceding-sibling::a查找包含文本p标记之前的anchor标记'Organise vos Evenements professionalnels':

"//p[contains(text(), 'Organise vos evenements professionnels')]/preceding-sibling::a"

使用您的 html:

In [11]: from lxml.html import fromstring

In [12]: xml = fromstring(html)

In [13]: print(xml.xpath("//p[contains(text(), 'Organise vos evenements professionnels')]/preceding-sibling::a"))
[<Element a at 0x7f5cae670188>]

In [14]: print(xml.xpath("//p[contains(text(), 'Organise vos evenements professionnels')]/preceding-sibling::a//text()"))
['\n Ouafae Ezzine\n ']

如果您想要不区分大小写的匹配,您可以 translate :

 "//p[contains(translate(text(),'ORGANISEVOSPRLT','organisevosprlt'), 'organise vos evenements professionnels')]/preceding-sibling::a"

关于python - Selenium webdriver 链接提取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36522607/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com