gpt4 book ai didi

html - 从 html 正文中获取文本

转载 作者:行者123 更新时间:2023-11-28 03:33:12 28 4
gpt4 key购买 nike

我有以下 html 代码:

<body class="frontend page-object" data-tealium="{"tmsData":{"ad_type":"Marktplatz","page_type":"Ad_View","vertical_id":"5","vertical":"Marktplatz","ad_title":"LEGO+Technic+8045+-+Mini-Teleskoplader+-+2+in+1","num_pictures":"4","category_level_1":"Spielen+%2F+Spielzeug","region_level_id_2":"9","category_level_3":"Lego","region_level_id_3":"117244","category_level_2":"Lego+%2F+Playmobil","region_level_id_1":"-141","price":"6","product_id":"67","category_level_max":"4","region_level_2":"Wien","region_level_3":"Wien%2C+22.+Bezirk%2C+Donaustadt","category_level_4":"Technic","seller_id":"19284847","region_level_1":"%C3%96sterreich","ad_type_id":"67","category_level_id_3":"5191","category_level_id_2":"5182","category_level_id_1":"5136","category_level_id_4":"5199","environment":"web","ad_id":"208824705","post_code":"1220","event_name":"adview","publish_date":"Sun+Jun+18+18%3A51%3A00+CEST+2017"}}" data-adid="208824705">

在这里,我尝试使用 beautifulsoup 获得此类别级别:"category_level_1":"Spielen+%2F+Spielzeug"。但是,我无法得到它。

如果我这样做:CatId = soup2.select("html body.frontend.page-object")[0].get_text().strip() 我得到整个 html 文本。

CatId = soup2.find("html body.frontend.page-object", {category_level_1})[0].get_text().strip() 没有给我任何东西。我只需要获取 Spielen+%2F+Spielzeug 知道如何解决这个问题吗?

非常感谢。

最佳答案

使用 JavaScript 获取它的一种方法是:

const category1 = JSON.parse(document.body.getAttribute('data-tealium')).tmsData.category_level_1;

console.log(category1);

确保 data-tealium 始终可用且 JSON 可解析:

const tealium = document.body.getAttribute('data-tealium');
const parsedData = JSON.parse(tealium);
const category1 =
parsedData &&
parsedData.tmsData &&
parsedData.tmsData.category_level_1 || null;

console.log(category1);

关于html - 从 html 正文中获取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44617639/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com