gpt4 book ai didi

python - 在 python 中-如何将多个 HTML 源代码保存到一个文本文件中

转载 作者:行者123 更新时间:2023-12-04 00:54:59 25 4
gpt4 key购买 nike

我有链接列表(存储在 links.txt 文件中)

这段代码可以保存一个链接的结果但我不知道如何让它下载 ALLALL 链接的源代码 (links.txt) 并将它们保存为 ONE SINGLE 文本文件 为下一步处理...

import urllib.request    
urllib.request.urlretrieve("https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1", "result.txt")

示例链接形式 links.txt

https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=1
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=2
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=3
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=4
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=5
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=6
https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn=7
....

最佳答案

网址库

import urllib.request

with open('links.txt', 'r') as f:
links = f.readlines()

for link in links:
with urllib.request.urlopen(link) as f:
# get html text
html = f.read().decode('utf-8')

# append html to file
with open('result.txt', 'w+') as f:
f.write(html)

requests

你也可以使用请求库,我觉得它更具可读性

pip install requests
import requests

with open('links.txt', 'r') as f:
links = f.readlines()

for link in links:
response = requests.get(link)
html = response.text

# append html to file
with open('result.txt', 'w+') as f:
f.write(html)

使用循环进行页面导航

使用 for 循环生成页面链接,因为唯一改变的是页面编号。

links = [
f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'
for n in range(1, 10) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
]

或者随你去

for n in range(1, 10):
link = f'https://www.ebay.com/sch/i.html?_from=R40&_nkw=abc&_sacat=0&_pgn={n}'

[...]

关于python - 在 python 中-如何将多个 HTML 源代码保存到一个文本文件中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63209446/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com