gpt4 book ai didi

python - 使用 python selenium xpath 抓取脚本标签

转载 作者:行者123 更新时间:2023-12-01 00:31:04 24 4
gpt4 key购买 nike

假设我想从网站上抓取一些元数据:

https://www.diepresse.com/4913597/autocluster-buhlt-um-osterreich-teststrecke-fur-google-autos

更准确地说,即来自 key fullChannel/home/wirtschaft/international从这里<script>标签:

<script>    
let pageBreakpoint = 'desktop';
let _screen = window.innerWidth;
if (_screen < 640) {
pageBreakpoint = 'mobile';
} else if (_screen < 1024) {
pageBreakpoint = 'tablet';
}

var dataLayer = window.dataLayer || [];
dataLayer.push({
'siteId': 'dpo',
'contentId': '4913597',
'pageType': 'article',
'contentTitle': 'Autocluster buhlt um Österreich-Teststrecke für Google-Autos',
'contentAuthor': '',
'contentElements': '',
'contentType': 'default',
'pageTags': '',
'wordCount': '264',
'wordCountRounded': '400',
'contentSource': '',
'contentPublishingDate': '',
'contentPublishingDateFormat': '28/01/2016',
'contentPublishingTime': '08:52',
'contentPublishingTimestamp': '28/01/2016 08:52:00',
'contentRepublishingTimestamp': '28/01/2016 08:52:00',
'contentTemplate': 'default',
'metaCategory': '',
'channel': 'international',
'fullChannel': '/home/wirtschaft/international',
'canonicalUrl': '',
'fullUrl': window.location.href,
'oewaPath': 'RedCont/Wirtschaft/Wirtschaftspolitik',
'oewaPage': 'homepage',
'isPremium':'no',
'isPremiumArticle': 'free',
'pageBreakpoint': pageBreakpoint,
'userId': ''
});
</script>

现在我正在使用 Selenium 和 Xpath,但无法真正弄清楚如何在此使用正则表达式:

#this doesnt work
driver.find_element_by_xpath("//script[text()]")

有什么建议吗?

最佳答案

使用 JavaScript Executor 获取 var 值 datalayer。它将以 json 数组的形式返回。

然后获取键fullChannel的值。

driver.get("https://www.diepresse.com/4913597/autocluster-buhlt-um-osterreich-teststrecke-fur-google-autos")
datalayer=driver.execute_script("return dataLayer")
print(datalayer)
print(datalayer[0]['fullChannel'])

输出:

[{'oewaPage': 'homepage', 'contentTitle': 'Autocluster buhlt um Österreich-Teststrecke für Google-Autos', 'userId': '', 'wordCount': '264', 'contentSource': '', 'contentPublishingDate': '', 'contentElements': '', 'contentAuthor': '', 'fullUrl': 'https://www.diepresse.com/4913597/autocluster-buhlt-um-osterreich-teststrecke-fur-google-autos', 'wordCountRounded': '400', 'contentTemplate': 'default', 'canonicalUrl': '', 'contentPublishingTime': '08:52', 'metaCategory': '', 'siteId': 'dpo', 'contentPublishingDateFormat': '28/01/2016', 'isPremium': 'no', 'oewaPath': 'RedCont/Wirtschaft/Wirtschaftspolitik', 'contentRepublishingTimestamp': '28/01/2016 08:52:00', 'contentPublishingTimestamp': '28/01/2016 08:52:00', 'pageTags': '', 'pageBreakpoint': 'desktop', 'contentType': 'default', 'fullChannel': '/home/wirtschaft/international', 'isPremiumArticle': 'free', 'contentId': '4913597', 'channel': 'international', 'pageType': 'article'}, {'faktorVendorData4': 'notset', 'event': 'faktorData', 'faktorData4': 'notset', 'gtm.uniqueEventId': 9, 'faktorData1': 'notset', 'faktorData2': 'notset', 'faktorData5': 'notset', 'faktorData3': 'notset'}, {'gtm.uniqueEventId': 3, 'gtm.start': 1569877670044, 'event': 'gtm.js'}, {'aboStatus': '', 'userId': '', 'userType': 'default', 'userStatus': 'logout'}, {'gtm.uniqueEventId': 6, 'event': 'gtm.dom'}, {'gtm.uniqueEventId': 14, 'gtm.start': 1569877672926, 'event': 'gtm.js'}, {'faktorGdprApplies': 1}, {'gtm.uniqueEventId': 15, 'event': 'gtm.load'}]

键值fullChannel

/home/wirtschaft/international

关于python - 使用 python selenium xpath 抓取脚本标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58175020/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com