gpt4 book ai didi

python - 使用 Selenium 的 Linkedin Web Scraper

转载 作者:行者123 更新时间:2023-12-04 17:41:17 28 4
gpt4 key购买 nike

一般来说,我是网络开发和抓取的新手,我正在尝试通过像 LinkedIn 这样的抓取网站来挑战自己。因为它们有 Ember 和动态变化的 ID,所以要正确地抓取它有点困难。

我正在尝试通过使用以下代码来查找 LinkedIn 个人资料的“体验部分”:

experience = driver.find_element_by_xpath('//section[@id = "experience-section"]/ul/li[@class="position"]')

司机获得了整个 Linkedin 个人资料网页。我想在“经验部分”下获得所有职位。错误信息是:

无法定位元素:{"method":"xpath","selector":"//section[@id = "experience-section"]/ul/li/div[@class="position “]”

我可以在 Linkedin 上抓取其他内容,但体验部分对我来说是一个很大的挣扎。 xpath 错了吗?如果是,我可以改变什么?

谢谢

<section id="experience-section" class="pv-profile-section experience-section ember-view"><header class="pv-profile-section__card-header">
<h2 class="pv-profile-section__card-heading t-20 t-black t-normal">
Experience
</h2>

<!----></header>

<ul id="ember1620" class="pv-profile-section__section-info section-info pv-profile-section__section-info--has-no-more ember-view"><li id="ember1622" class="pv-profile-section__sortable-item pv-profile-section__section-info-item relative pv-profile-section__list-item sortable-item ember-view"><div id="ember1623" class="pv-entity__position-group-pager ember-view"> <li id="392598211" class="pv-profile-section__sortable-card-item pv-profile-section pv-position-entity ember-view"><!----><a data-control-name="background_details_company" href="/company/8736/" id="ember1626" class="ember-view"> <div class="pv-entity__logo company-logo">
<img class="lazy-image pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 loaded" alt="Bill &amp; Melinda Gates Foundation" src="https://media.licdn.com/dms/image/C560BAQHvFIyUvuKtQA/company-logo_400_400/0?e=1556755200&amp;v=beta&amp;t=Qhh8_KnrE-OiuXAutFyeI69tgUF3c1ptC9N12siDO4o">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
<h3 class="t-16 t-black t-bold">Co-chair</h3>

<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Bill &amp; Melinda Gates Foundation</span>
</h4>

<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>2000 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">19 yrs</span>
</h4>
</div>

<!---->
</div>

</a>
<!---->
</li>


</div>
</li><li id="ember1630" class="pv-profile-section__sortable-item pv-profile-section__section-info-item relative pv-profile-section__list-item sortable-item ember-view"><div id="ember1631" class="pv-entity__position-group-pager ember-view"> <li id="392599749" class="pv-profile-section__sortable-card-item pv-profile-section pv-position-entity ember-view"><!----><a data-control-name="background_details_company" href="/company/1035/" id="ember1634" class="ember-view"> <div class="pv-entity__logo company-logo">
<img class="lazy-image pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 loaded" alt="Microsoft" src="https://media.licdn.com/dms/image/C4D0BAQEko6uLz7XylA/company-logo_400_400/0?e=1556755200&amp;v=beta&amp;t=XQhwV5ruWfGBfjgQylV9gkeXD8VnQRBHGd1bOfTs2tw">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
<h3 class="t-16 t-black t-bold">Co-founder</h3>

<h4 class="t-16 t-black t-normal">
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title">Microsoft</span>
</h4>

<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>1975 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">44 yrs</span>
</h4>
</div>

<!---->
</div>

</a>
<!---->
</li>


</div>
</li>
</ul>
<!----></section>

----更新:我使用了Sers提供的方案

driver.get('https://www.linkedin.com/in/williamhgates/')
experience = driver.find_elements_by_xpath('//section[@id = "experience-section"]/ul//li')
for item in experience:
print(item.text)
print("")

我以某种方式得到了两次结果:

Co-chair
Company Name
Bill & Melinda Gates Foundation
Dates Employed
2000 – Present
Employment Duration
19 yrs

Co-chair
Company Name
Bill & Melinda Gates Foundation
Dates Employed
2000 – Present
Employment Duration
19 yrs

Co-founder
Company Name
Microsoft
Dates Employed
1975 – Present
Employment Duration
44 yrs

Co-founder
Company Name
Microsoft
Dates Employed
1975 – Present
Employment Duration
44 yrs

最佳答案

你的 xpath 中的问题是 li 不在 ul 的正下方,试试下面的 xpath:

//section[@id = "experience-section"]/ul//li

更新

driver.get('https://www.linkedin.com/in/williamhgates/')
experience = driver.find_elements_css_selector('#experience-section .pv-profile-section')
for item in experience:
print(item.text)
print("")

关于python - 使用 Selenium 的 Linkedin Web Scraper,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54392465/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com