gpt4 book ai didi

python - 为什么我在用 python 抓取时无法获取字符串?

转载 作者:太空宇宙 更新时间:2023-11-04 10:01:59 27 4
gpt4 key购买 nike

这是我的代码,我想从网站上抓取一个单词列表,但是当我在

上调用 .string 时
import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
view = entry_view[0]
list = view.ul

for m in list:
for x in m:
title = x.string
print(title)

我想要的是一个打印网站文本的列表,但我得到的是一个错误

Traceback (most recent call last):
File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
title = x.string
AttributeError: 'str' object has no attribute 'string'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
title = x.string
AttributeError: 'str' object has no attribute 'string'

最佳答案

您可以使用以下代码实现您想要的效果。

代码:

import requests
from bs4 import BeautifulSoup

url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")

entry_view = soup.find_all('div', {'class': 'entries'})

entries = []
for elem in entry_view:
for e in elem.find_all('a'):
entries.append(e.text)

#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))

输出:

['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100

在您的代码中:

print(type(list))
<class 'bs4.element.Tag'>

print(type(m))
<class 'bs4.element.NavigableString'>

print(type(x))
<class 'str'>

因此,如您所见,变量x 已经是一个字符串,因此使用bs4 method .string() 是没有意义的。 .

p.s.您不应该使用像list这样的变量名,它是保留关键字。

关于python - 为什么我在用 python 抓取时无法获取字符串?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43113232/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com