gpt4 book ai didi

python - 识别表格行和表格数据 - CSS Selector Python

转载 作者:太空宇宙 更新时间:2023-11-04 04:32:28 25 4
gpt4 key购买 nike

我有多个表格行案例,我想从中提取数据:

案例一

 Onsite Service After Remote Diagnosis  April 19, 2014  April 19, 2017

案例二

CAR                                     October 15, 2016    October 15, 2017    
Onsite Service After Remote Diagnosis October 15, 2016 October 15, 2019

案例三

NBD ProSupport                          July 16, 2008   July 15, 2011   
Onsite Service After Remote Diagnosis July 16, 2008 July 15, 2011

我需要提取的信息在第二个 td 上包含“远程诊断后的现场服务”的行上,对于每种情况,这将是该行右侧的日期

预期输出:

                      April 19, 2017
October 15, 2017
July 15, 2011

我的代码:

from selenium import webdriver
import time
from openpyxl import load_workbook

driver = webdriver.Chrome()


def scrape(codes):
dates = []
for i in range(len(codes)):
driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
"servicetag/%s/warranty?ref=captchasuccess" % codes[i])

# Solve captcha manually
if i == 0:
print("You now have 120\" seconds to solve the captcha")
time.sleep(120)
print("120\" Passed")
# Extract data
expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
print(expdate.get_attribute('innerText'))
driver.close()

codes = ['159DT3J', '15FDBG2', '10V8YZ1']
scrape(codes)

我的输出:

April 19, 2014
October 15, 2016
July 16, 2008

取自出现的第一行和第一个td我试过更改 tbody > tr > td:nth-child(3),但根据文本进行识别会更好并避免错误。

最佳答案

由于您需要提取“远程诊断后现场服务”的文本,我建议您使用以下内容更新用于查找元素的行:

expdate = driver.find_element_by_xpath("//td[text()='Onsite Service After Remote Diagnosis']/following-sibling::td")

在这里,我们使用 xpath 定位器并在文本“远程诊断后的现场服务”旁边寻找 td

关于python - 识别表格行和表格数据 - CSS Selector Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52422266/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com