excel - 使用 <li> 标签从网站抓取 html 数据-6ren

excel - 使用
标签从网站抓取 html 数据

转载作者：行者123 更新时间：2023-12-04 19:58:41

25

4

我正在尝试从这个彩票网站获取数据:
https://www.lotterycorner.com/tx/lotto-texas/2019

我要抓取的数据是 2017 年到 2019 年的日期和中奖号码。然后我想将数据转换为列表并保存到 csv 文件或 excel 文件。

如果我的问题无法理解，我深表歉意，因为我是 python 新手。这是我尝试过的代码，但我不知道在此之后该怎么做

page = requests.get('https://www.lotterycorner.com/tx/lotto-texas/2017')    
soup = BeautifulSoup(page.content,'html.parser')    
week = soup.find(class_='win-number-table row no-brd-reduis')    
dates = (week.find_all(class_='win-nbr-date col-sm-3 col-xs-4'))    
wn = (week.find_all(class_='nbr-grp'))

我希望我的结果是这样的:

最佳答案

如果有表格标签，请不要使用 BeautifulSoup。让 Pandas 为您完成工作要容易得多(它使用 BeautifulSoup 在后台解析表格)。

import pandas as pd

years = [2017, 2018, 2019]

df = pd.DataFrame()
for year in years:
    url = 'https://www.lotterycorner.com/tx/lotto-texas/%s' %year
    table = pd.read_html(url)[0][1:]
    win_nums = table.loc[:,1].str.split(" ",expand=True).reset_index(drop=True)
    dates = pd.DataFrame(list(table.loc[:,0]), columns=['date'])

    table = dates.merge(win_nums, left_index=True, right_index=True)

    df = df.append(table, sort=True).reset_index(drop=True) 

df['date']= pd.to_datetime(df['date']) 
df = df.sort_values('date').reset_index(drop=True)

df.to_csv('file.csv', index=False, header=False)

输出:

print (df)
          date   0   1   2   3   4   5
0   2017-01-04   5   7  36  39  40  44
1   2017-01-07   2   5  14  18  26  27
2   2017-01-11   4  13  16  19  43  51
3   2017-01-14   7   8  10  18  47  48
4   2017-01-18   6  11  17  37  40  49
5   2017-01-21   2  13  17  39  41  46
6   2017-01-25   1  14  19  32  37  46
7   2017-01-28   5   7  30  48  51  52
8   2017-02-01  12  19  26  29  37  54
9   2017-02-04   8  13  19  25  26  29
10  2017-02-08  10  15  47  49  51  52
11  2017-02-11  24  25  26  29  41  53
12  2017-02-15   1   4   5  43  53  54
13  2017-02-18   5  11  14  21  38  44
14  2017-02-22   4   8  21  27  52  53
15  2017-02-25  16  37  42  46  49  54
16  2017-03-01   3  24  33  34  45  51
17  2017-03-04   2   4   5  17  48  50
18  2017-03-08  15  19  24  33  34  47
19  2017-03-11   5   6  24  28  29  37
20  2017-03-15   4  11  19  27  32  46
21  2017-03-18  12  15  16  23  38  43
22  2017-03-22   3   5  15  27  36  52
23  2017-03-25  21  25  27  30  36  48
24  2017-03-29   7   9  11  18  23  43
25  2017-04-01   3  21  28  33  38  52
26  2017-04-05   8  20  21  26  51  52
27  2017-04-08  10  11  12  47  48  52
28  2017-04-12   5  26  30  31  46  54
29  2017-04-15   2  11  36  40  42  53
..         ...  ..  ..  ..  ..  ..  ..
265 2019-07-20   3  35  38  45  50  51
266 2019-07-24   2   9  16  22  46  49
267 2019-07-27   1   2   6   8  20  53
268 2019-07-31  20  24  34  36  41  44
269 2019-08-03   6  17  18  20  26  34
270 2019-08-07   1   3  16  22  31  35
271 2019-08-10  18  19  27  36  48  52
272 2019-08-14  22  23  29  36  39  49
273 2019-08-17  14  18  21  23  40  44
274 2019-08-21  18  28  29  36  48  52
275 2019-08-24  11  31  42  48  50  52
276 2019-08-28   9  21  40  42  49  53
277 2019-08-31   5   7  30  41  44  54
278 2019-09-04   4  26  36  37  45  50
279 2019-09-07  22  23  31  33  40  42
280 2019-09-11   8  11  12  30  31  49
281 2019-09-14   1   3  24  28  31  41
282 2019-09-18   3  24  26  29  45  50
283 2019-09-21   2  20  31  43  45  54
284 2019-09-25   5   9  26  38  41  44
285 2019-09-28  16  18  39  45  49  54
286 2019-10-02   9  26  39  42  47  49
287 2019-10-05   6  10  18  24  32  37
288 2019-10-09  14  18  19  27  33  41
289 2019-10-12   3  11  15  29  44  49
290 2019-10-16  12  15  25  39  46  49
291 2019-10-19  19  29  41  46  50  51
292 2019-10-23   4   5  11  35  44  50
293 2019-10-26   1   2  26  41  42  54
294 2019-10-30  10  11  28  31  40  53

[295 rows x 7 columns]

关于excel - 使用 <li> 标签从网站抓取 html 数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58561118/

25

4

0

文章推荐： excel - 如何接受excel公式中的9个第一个字符？

文章推荐： python - 如何在不保存的情况下使用 win32com 退出 Excel？

文章推荐： excel - 使用VBA替换字符串中特定位置的文本

文章推荐： php - maatwebsite laravel excel 导出列与下拉列表

首页

博学

6Ren·AI

商城

excel - 使用
标签从网站抓取 html 数据

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内

首页

博学

6Ren·AI

商城

excel - 使用 标签从网站抓取 html 数据

标签)？ 根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？ 是吗 stackoverflow 或 stackoverflow 谢谢 最佳答案 根据网络标准，您不能将 block 元素放入内

excel - 使用
标签从网站抓取 html 数据

标签)？
根据 Web 标准，创建带有标题 1 的链接的正确代码是什么？是吗 stackoverflow 或 stackoverflow 谢谢最佳答案根据网络标准，您不能将 block 元素放入内