- ubuntu12.04环境下使用kvm ioctl接口实现最简单的虚拟机
- Ubuntu 通过无线网络安装Ubuntu Server启动系统后连接无线网络的方法
- 在Ubuntu上搭建网桥的方法
- ubuntu 虚拟机上网方式及相关配置详解
CFSDN坚持开源创造价值,我们致力于搭建一个资源共享平台,让每一个IT人在这里找到属于你的精彩世界.
这篇CFSDN的博客文章python beautiful soup库入门安装教程由作者收集整理,如果你对这篇文章有兴趣,记得点赞哟.
1
|
pip install beautifulsoup4
|
beautiful soup库是解析、遍历、维护“标签树”的功能库 。
1
2
|
from
bs4
import
BeautifulSoup
import
bs4
|
BeautifulSoup对应一个HTML/XML文档的全部内容 。
1
2
3
4
5
|
import
requests
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
print
(demo)
|
1
2
3
4
5
6
|
<
html
><
head
><
title
>This is a python demo page</
title
></
head
>
<
body
>
<
p
class
=
"title"
><
b
>The demo python introduces several python courses.</
b
></
p
>
<
p
class
=
"course"
>Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<
a
href
=
"http://www.icourse163.org/course/BIT-268001"
class
=
"py1"
id
=
"link1"
>Basic Python</
a
> and <
a
href
=
"http://www.icourse163.org/course/BIT-1001870001"
class
=
"py2"
id
=
"link2"
>Advanced Python</
a
>.</
p
>
</
body
></
html
>
|
。
基本元素 | 说明 |
---|---|
Tag | 标签,最基本的信息组织单元,分别用<>和</>标明开头和结尾 |
。
1
2
3
4
5
6
7
8
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
print
(soup.title)
tag
=
soup.a
print
(tag)
|
1
2
|
<
title
>This is a python demo page</
title
>
<
a
href
=
"http://www.icourse163.org/course/BIT-268001"
>Basic Python</
a
>
|
任何存在于HTML语法中的标签都可以用soup.访问获得。当HTML文档中存在多个相同对应内容时,soup.返回第一个 。
。
基本元素 | 说明 |
---|---|
Name | 标签的名字, … 。 的名字是'p',格式:.name |
。
1
2
3
4
5
6
7
8
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
print
(soup.a.name)
print
(soup.a.parent.name)
print
(soup.a.parent.parent.name)
|
1
2
3
|
a
p
body
|
。
基本元素 | 说明 |
---|---|
Attributes | 标签的属性,字典形式组织,格式:.attrs |
。
1
2
3
4
5
6
7
8
9
10
11
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
tag
=
soup.a
print
(tag.attrs)
print
(tag.attrs[
'class'
])
print
(tag.attrs[
'href'
])
print
(
type
(tag.attrs))
print
(
type
(tag))
|
1
2
3
4
5
|
{
'href'
:
'http://www.icourse163.org/course/BIT-268001'
,
'class'
: [
'py1'
],
'id'
:
'link1'
}
[
'py1'
]
http:
/
/
www.icourse163.org
/
course
/
BIT
-
268001
<
class
'dict'
>
<
class
'bs4.element.Tag'
>
|
Tag的NavigableString 。
。
基本元素 | 说明 |
---|---|
NavigableString | 标签内非属性字符串,<>…</>中字符串,格式:.string |
。
Tag的Comment 。
。
基本元素 | 说明 |
---|---|
Comment | 标签内字符串的注释部分,一种特殊的Comment类型 |
。
1
2
3
4
5
6
7
|
import
requests
from
bs4
import
BeautifulSoup
newsoup
=
BeautifulSoup(
"<b><!--This is a comment--></b><p>This is not a comment</p>"
,
"html.parser"
)
print
(newsoup.b.string)
print
(
type
(newsoup.b.string))
print
(newsoup.p.string)
print
(
type
(newsoup.p.string))
|
1
2
3
4
|
This
is
a comment
<
class
'bs4.element.Comment'
>
This
is
not
a comment
<
class
'bs4.element.NavigableString'
>
|
。
属性 | 说明 |
---|---|
.contents | 子节点的列表,将所有儿子结点存入列表 |
.children | 子节点的迭代类型,与.contents类似,用于循环遍历儿子结点 |
.descendents | 子孙节点的迭代类型,包含所有子孙节点,用于循环遍历 |
。
BeautifulSoup类型是标签树的根节点 。
1
2
3
4
5
6
7
8
9
10
11
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
print
(soup.head)
print
(soup.head.contents)
print
(soup.body.contents)
print
(
len
(soup.body.contents))
print
(soup.body.contents[
1
])
|
1
2
3
4
5
6
7
8
|
<head><title>This
is
a python demo page<
/
title><
/
head>
[<title>This
is
a python demo page<
/
title>]
[
'\n'
, <p ><b>The demo python introduces several python courses.<
/
b><
/
p>,
'\n'
, <p >Python
is
a wonderful general
-
purpose programming language. You can learn Python
from
novice to professional by tracking the
following courses:
<a href
=
"http://www.icourse163.org/course/BIT-268001"
>Basic Python<
/
a>
and
<a href
=
"http://www.icourse163.org/course/BIT-1001870001"
>Advanced Python<
/
a>.<
/
p>,
'\n'
]
5
<p ><b>The demo python introduces several python courses.<
/
b><
/
p>
|
1
2
3
4
|
for
child
in
soup.body.children:
print
(child)
#遍历儿子结点
for
child
in
soup.body.descendants:
print
(child)
#遍历子孙节点
|
。
属性 | 说明 |
---|---|
.parent | 节点的父亲标签 |
.parents | 节点先辈标签的迭代类型,用于循环遍历先辈节点 |
。
1
2
3
4
5
6
7
8
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
print
(soup.title.parent)
print
(soup.html.parent)
|
1
2
3
4
5
6
7
|
<
head
><
title
>This is a python demo page</
title
></
head
>
<
html
><
head
><
title
>This is a python demo page</
title
></
head
>
<
body
>
<
p
><
b
>The demo python introduces several python courses.</
b
></
p
>
<
p
>Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
<
a
href
=
"http://www.icourse163.org/course/BIT-268001"
>Basic Python</
a
> and <
a
href
=
"http://www.icourse163.org/course/BIT-1001870001"
>Advanced Python</
a
>.</
p
>
</
body
></
html
>
|
1
2
3
4
5
6
7
8
9
10
11
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
for
parent
in
soup.a.parents:
if
parent
is
None
:
print
(parent)
else
:
print
(parent.name)
|
1
2
3
4
|
p
body
html
[document]
|
属性 | 说明 |
---|---|
.next_sibling | 返回按照HTML文本顺序的下一个平行节点标签 |
.previous.sibling | 返回按照HTML文本顺序的上一个平行节点标签 |
.next_siblings | 迭代类型,返回按照HTML文本顺序的后续所有平行节点标签 |
.previous.siblings | 迭代类型,返回按照HTML文本顺序的前续所有平行节点标签 |
1
2
3
4
5
6
7
8
9
10
11
12
13
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
print
(soup.a.next_sibling)
print
(soup.a.next_sibling.next_sibling)
print
(soup.a.previous_sibling)
print
(soup.a.previous_sibling.previous_sibling)
print
(soup.a.parent)
|
1
2
3
4
5
6
7
|
and
<a
class
=
"py2"
href
=
"http://www.icourse163.org/course/BIT-1001870001"
id
=
"link2"
>Advanced Python<
/
a>
Python
is
a wonderful general
-
purpose programming language. You can learn Python
from
novice to professional by tracking the following courses:
None
<p
class
=
"course"
>Python
is
a wonderful general
-
purpose programming language. You can learn Python
from
novice to professional by tracking the following courses:
<a
class
=
"py1"
href
=
"http://www.icourse163.org/course/BIT-268001"
id
=
"link1"
>Basic Python<
/
a>
and
<a
class
=
"py2"
href
=
"http://www.icourse163.org/course/BIT-1001870001"
id
=
"link2"
>Advanced Python<
/
a>.<
/
p>
|
1
2
3
4
|
for
sibling
in
soup.a.next_sibling:
print
(sibling)
#遍历后续节点
for
sibling
in
soup.a.previous_sibling:
print
(sibling)
#遍历前续节点
|
1
2
3
4
5
6
7
|
import
requests
from
bs4
import
BeautifulSoup
r
=
requests.get(
"http://python123.io/ws/demo.html"
)
demo
=
r.text
soup
=
BeautifulSoup(demo,
"html.parser"
)
print
(soup.prettify())
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
<
html
>
<
head
>
<
title
>
This is a python demo page
</
title
>
</
head
>
<
body
>
<
p
class
=
"title"
>
<
b
>
The demo python introduces several python courses.
</
b
>
</
p
>
<
p
class
=
"course"
>
Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:
Basic Python
</
a
>
and
<
a
class
=
"py2"
href
=
"http://www.icourse163.org/course/BIT-1001870001"
id
=
"link2"
>
Advanced Python
</
a
>
.
</
p
>
</
body
>
</
html
>
|
.prettify()为HTML文本<>及其内容增加更加'\n' .prettify()可用于标签,方法:.prettify() 。
bs4库将任何HTML输入都变成utf-8编码 python 3.x默认支持编码是utf-8,解析无障碍 。
1
2
3
4
5
6
7
|
import
requests
from
bs4
import
BeautifulSoup
soup
=
BeautifulSoup(
"<p>中文</p>"
,
"html.parser"
)
print
(soup.p.string)
print
(soup.p.prettify())
|
1
2
3
4
5
|
中文
<p>
中文
<
/
p>
|
到此这篇关于python beautiful soup库入门安装教程的文章就介绍到这了,更多相关python beautiful soup库入门内容请搜索我以前的文章或继续浏览下面的相关文章希望大家以后多多支持我! 。
原文链接:https://blog.csdn.net/weixin_46530492/article/details/119960182 。
最后此篇关于python beautiful soup库入门安装教程的文章就讲到这里了,如果你想了解更多关于python beautiful soup库入门安装教程的内容请搜索CFSDN的文章或继续浏览相关文章,希望大家以后支持我的博客! 。
我注意到一个非常烦人的错误:BeautifulSoup4(包:bs4)经常发现比以前版本(包:BeautifulSoup)更少的标签。 这是该问题的一个可重现的实例: import requests
我正在尝试从具有我所知道的特定ID的表中获取数据。 由于某种原因,该代码不断给我“无”结果。 我正在尝试从HTML代码中解析: שווי שוק (אלפי ש"ח)
我正在尝试从包含以下 HTML 的网站中提取价格: $ 29.99 我正在使用以下 Beautiful Soup 代码: book_prices = soup_pack
我做了一个网络爬虫,它从一个文本文件中获取数千个 Urls,然后爬取该网页上的数据。 现在它有很多网址;一些网址也被破坏了。 所以它给了我错误: Traceback (most recent call
我正在尝试加载 html 页面并输出文本,即使我正确获取网页,BeautifulSoup 以某种方式破坏了编码。 来源: # -*- coding: utf-8 -*- import requests
目录 beautiful soup库的安装 beautiful soup库的理解 beautiful soup库的引用 BeautifulSoup类
Beautiful Soup就是Python的一个HTML或XML的解析库,可以用它来方便地从网页中提取数据。它有如下三个特点: Beautiful Soup提供一些简单的、Python式的
题目地址:https://leetcode.com/problems/beautiful-arrangement/description/ 题目描述 Suppose you have N inte
题目地址:https://leetcode.com/problems/beautiful-array/description/ 题目描述 Forsome fixed N, an array A i
您好,我正在尝试从网站获取一些信息。请原谅我,如果我的格式有任何错误,这是我第一次发布到 SO。 soup.find('div', {"class":"stars"}) 从这里我收到 我需要 “
我想从 Google Arts & Culture 检索信息使用 BeautifulSoup。我检查了许多 stackoverflow 帖子( [1] , [2] , [3] , [4] , [5]
我决定学习 Python,因为我现在有更多时间(由于大流行)并且一直在自学 Python。 我试图从一个网站上刮取税率,几乎可以获得我需要的一切。下面是来自我的 Soup 变量以及相关 Python
我正在使用 beautifulsoup 从页面中获取所有链接。我的代码是: import requests from bs4 import BeautifulSoup url = 'http://ww
我正在使用react-beautiful-dnd版本8.0.5(最新)并尝试渲染可重组列表,但我不断收到此错误: Warning: React.createElement: type is inval
我在将组件放入应用程序的下一个列表区域时遇到困难。我可以在父列中完美地拖放和排序,但无法将组件放在其他地方。这是我的 onDragEnd 函数中的代码: onDragEnd = result =>
发生的情况是,当我在一列中有多个项目并尝试拖动其中一个时,只显示一个项目,并且根据发现的经验教训 here我应该处于可以移动同一列内的项目但不能移动的位置。在 React 开发工具中,state 和
我正在尝试根据部分属性值来识别 html 文档中的标签。 例如,如果我有一个 Beautifulsoup 对象: import bs4 as BeautifulSoup r = requests.ge
Показать телефон 如何在 Beautiful Soup 中找到上述元素? 我尝试了以下方法,但没有奏效: show = soup.find('div', {'class': 'acti
我如何获得结果网址:https://www.sec.gov/Archives/edgar/data/1633917/000163391718000094/0001633917-18-000094-in
我是 python 新手,尝试从页面中提取表格,但无法使用 BS4 找到该表格。你能告诉我我哪里出错了吗? import requests from bs4 import BeautifulSoup
我是一名优秀的程序员,十分优秀!