gpt4 book ai didi

python - 如何循环遍历标签并重定向以检索更多标签?

转载 作者:太空宇宙 更新时间:2023-11-03 17:18:21 25 4
gpt4 key购买 nike

出于教育目的,我正在尝试编写一个程序来提示用户输入“url”、“count”和“position”。 “url”将被抓取,并且“url”内的“a 标签”将被检索,这将产生“a 标签”列表。然后使用“位置”从先前检索到的“a 标签”列表中选择一个新链接,并将其用作要抓取的新“url”。 “Count”是这个过程发生的次数。

Code:
import urllib
from bs4 import BeautifulSoup as bfs

# Declare global variables
href_list = []
no_iterations = 0

# Prompt user for input
url = raw_input('Enter url - ')
count = raw_input('Enter count - ')
position = raw_input('Enter position - ')

# While loop with condition
while no_iterations != int(count):
no_iterations += 1

# Scraping the url
html = urllib.urlopen(url).read()
soup = bfs(html)

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
href_list.append(tag.get('href', None))

# Assiginig new url
url = href_list[int(position)-1]

# Printing info for user
print 'Retrieving:', href_list[int(position)-1]
print 'Last Url:', href_list[int(position)-1]

当我运行该程序时,我得到的是:

Enter url - http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html 
Enter count - 4
Enter position - 3

Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html
Last Url: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html

通过观察输出,我可以看到 URL 没有按应有的方式重置,如有任何建议,我们将不胜感激。

最佳答案

我通过重置列表来解决我是否存储了检索到的标签代码:

import urllib
from bs4 import BeautifulSoup as bfs

# Declare global variables
href_list = []
no_iterations = 0

# Prompt user for input
url = raw_input('Enter url - ')
count = raw_input('Enter count - ')
position = raw_input('Enter position - ')

# While loop with condition
while no_iterations != int(count):
no_iterations += 1

# Scraping the url
html = urllib.urlopen(url).read()
soup = bfs(html)

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
href_list.append(tag.get('href', None))

# Assiginig new url
url = href_list[int(position)-1]
href_list = []
# Printing info for user
print 'Retrieving:', href_list[int(position)-1]
print 'Last Url:', url

所以现在的新输出是:

Enter url - http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html 
Enter count - 4
Enter position - 3
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Montgomery.html
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Mhairade.html
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Butchi.html
Retrieving: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Anayah.html
Last Url: http://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Anayah.html

感谢您的支持

关于python - 如何循环遍历标签并重定向以检索更多标签?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33423040/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com