gpt4 book ai didi

python - 如何使用 html 中的已知文本考虑前面的元素?

转载 作者:行者123 更新时间:2023-12-01 07:34:29 26 4
gpt4 key购买 nike

我有这样的html:我需要使用当前时间列旁边的上升/下降列[第10列]来获取当前时间!

<table id="table" class="tablesorter">
<thead>
<tr>
<th rowspan="2"><div align="center">Sno</div></th>
<th rowspan="2"><div align="center">Site Id</div></th>
<th rowspan="2"><div align="center">Mandal</div></th>
<th rowspan="2"><div align="center">Piezometer
Location
(Village) </div></th>
<th rowspan="2" ><div align="center">July-18
15/05/2018 <br>10:00 HRS</div></th>

<th rowspan="2" ><div align="center">Nov-18</div></th>
<th rowspan="2" ><div align="center">May-19</div></th>
<th rowspan="2" ><div align="center">June-19</div></th>
<th rowspan="2" ><div align="center">July-19
15/07/2019 <br>10:00 HRS</div></th>
<th colspan="4" ><div align="center">Rise(+)/Fall(-) from current water level
and with reference to</div></th>
</tr>
<tr>
<th ><div align="center">July-18</div></th>
<th ><div align="center">Nov-18</div></th>
<th ><div align="cesnter">May-19</div></th>
<th ><div align="cesnter">Jun-19</div></th>
</tr>
</thead>
<tbody>
<div align="center">

我的目标是获取位于“上升/下降”列之前的当前时间。这是我写的代码

import requests
from lxml import html
url = 'http://www.apsdps.gov.in/gw_status.jsp?s1=1'
def scrape():
print("start round")
try:
r=requests.get(url)
d=r.content.decode(r.encoding)
tree=html.fromstring(d)
table = tree.xpath("//table[@id='table']")[0]
fq_time_ele = tree.xpath("//table[@id='table']//thead//th//[contains(text(),'Rise(+)/Fall(-) from current water level and with reference to')]//preceding-sibling::th[1]//text()")
curdate = fq_time_ele[0].strip().split()[-1].replace("/", "-")
curtime = fq_time_ele[1].split(" ")[0].split(":")[0]
time_str = curdate + "_" + curtime
print(time_str)
except Exception as e:
print("Error ", str(e))
print("end round")
try:
scrape()
except:
print("It is not working")

我需要当前时间,但代码不起作用。谁能帮我吗?

最佳答案

使用以下方法和更正的 xpath:

import requests
from lxml import html

url = 'http://www.apsdps.gov.in/gw_status.jsp?s1=1'


def scrape():
print("start round")
try:
content = requests.get(url).content
tree = html.fromstring(content)
curr_time_parts = tree.xpath("//table[@id='table']//th[*[contains(text(),'Rise(+)/Fall(-)')]]"
"/preceding-sibling::th[1]/*/text()")
date_, time_ = curr_time_parts
date_ = ' '.join(date_.split())
print(date_, time_)
except Exception as e:
print("Error ", str(e))
print("end round")


try:
scrape()
except:
print("It is not working")

输出:

start round
July-19 16/07/2019 16:00 HRS
end round

关于python - 如何使用 html 中的已知文本考虑前面的元素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57054150/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com