gpt4 book ai didi

python - 使用 Python 和 BeautifulSoup 从 HTML 中抓取数字

转载 作者:太空宇宙 更新时间:2023-11-03 20:36:05 25 4
gpt4 key购买 nike

这是我的作业:

In this assignment you will write a Python program similar to http://www.py4e.com/code3/urllink2.py. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.

We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.

Sample data: http://py4e-data.dr-chuck.net/comments_42.html (Sum=2553)

Actual data: http://py4e-data.dr-chuck.net/comments_228869.html (Sum ends with 10)

You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment - so only use your own data url for analysis.

我想修复我的代码,因为这是我到目前为止所学到的。我收到一个错误名称

urlib is not defined

..如果我使用导入,那么我的套接字就会出现问题。

import urllib
import re
from bs4 import BeautifulSoup


url = input('Enter - ')
html = urlib.request(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")


sum=0
# Retrieve all of the anchor tags
tags = soup('span')
for tag in tags:
# Look at the parts of a tag
y=str(tag)
x= re.findall("[0-9]+",y)
for i in x:
i=int(i)
sum=sum+i
print(sum)

最佳答案

拼写错误:您有 urlib,它应该是 urllibcontext=ctx 不是必需的:

import re
import urllib
from bs4 import BeautifulSoup

# url = 'http://py4e-data.dr-chuck.net/comments_42.html'
url = 'http://py4e-data.dr-chuck.net/comments_228869.html'

soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html.parser')
s = sum(int(td.text) for td in soup.select('td:last-child')[1:])

print(s)

打印:

2410

编辑:运行脚本:

import urllib.request
import re
from bs4 import BeautifulSoup


html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_228869.html').read()
soup = BeautifulSoup(html, "html.parser")

sum=0
# Retrieve all of the anchor tags
tags = soup('span')
for tag in tags:
# Look at the parts of a tag
y=str(tag)
x= re.findall("[0-9]+",y)
for i in x:
i=int(i)
sum=sum+i
print(sum)

打印:

2410

关于python - 使用 Python 和 BeautifulSoup 从 HTML 中抓取数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57165612/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com