python beautiful soup库入门安装教程-6ren

python beautiful soup库入门安装教程

转载作者：qq735679552 更新时间：2022-09-27 22:32:09

CFSDN坚持开源创造价值，我们致力于搭建一个资源共享平台，让每一个IT人在这里找到属于你的精彩世界.

这篇CFSDN的博客文章python beautiful soup库入门安装教程由作者收集整理，如果你对这篇文章有兴趣，记得点赞哟.

beautiful soup库的安装

 
    ? 
   
         pip install beautifulsoup4

beautiful soup库的理解

beautiful soup库是解析、遍历、维护“标签树”的功能库。

beautiful soup库的引用

 
    ? 
   
         from 
         bs4  
         import 
         BeautifulSoup 
        
         import 
         bs4

BeautifulSoup类

BeautifulSoup对应一个HTML/XML文档的全部内容。

回顾demo.html

 
    ? 
   
         import 
         requests 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         print 
         (demo)

 
    ? 
   
 
     
       
       
         < 
         html 
         >< 
         head 
         >< 
         title 
         >This is a python demo page</ 
         title 
         ></ 
         head 
         > 
        
 
         < 
         body 
         > 
        
 
         < 
         p 
         class 
         = 
         "title" 
         >< 
         b 
         >The demo python introduces several python courses.</ 
         b 
         ></ 
         p 
         > 
        
 
         < 
         p 
         class 
         = 
         "course" 
         >Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses: 
        
 
         < 
         a 
         href 
         = 
         "http://www.icourse163.org/course/BIT-268001" 
         class 
         = 
         "py1" 
         id 
         = 
         "link1" 
         >Basic Python</ 
         a 
         > and < 
         a 
         href 
         = 
         "http://www.icourse163.org/course/BIT-1001870001" 
         class 
         = 
         "py2" 
         id 
         = 
         "link2" 
         >Advanced Python</ 
         a 
         >.</ 
         p 
         > 
        
 
         </ 
         body 
         ></ 
         html 
         > 
        
 
     
 
   

Tag标签

。

基本元素	说明
Tag	标签，最基本的信息组织单元，分别用<>和</>标明开头和结尾

。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         print 
         (soup.title) 
        
         tag  
         = 
         soup.a 
        
         print 
         (tag)

 
    ? 
   
         < 
         title 
         >This is a python demo page</ 
         title 
         > 
        
         < 
         a  
         href 
         = 
         "http://www.icourse163.org/course/BIT-268001" 
         >Basic Python</ 
         a 
         >

任何存在于HTML语法中的标签都可以用soup.访问获得。当HTML文档中存在多个相同对应内容时，soup.返回第一个。

Tag的name

。

基本元素	说明
Name	标签的名字， … 。的名字是'p',格式：.name

。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         print 
         (soup.a.name) 
        
         print 
         (soup.a.parent.name) 
        
         print 
         (soup.a.parent.parent.name)

Tag的attrs（属性）

。

基本元素	说明
Attributes	标签的属性，字典形式组织，格式：.attrs

。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         tag  
         = 
         soup.a 
        
         print 
         (tag.attrs) 
        
         print 
         (tag.attrs[ 
         'class' 
         ]) 
        
         print 
         (tag.attrs[ 
         'href' 
         ]) 
        
         print 
         ( 
         type 
         (tag.attrs)) 
        
         print 
         ( 
         type 
         (tag))

 
    ? 
   
 
     
       
       
         { 
         'href' 
         :  
         'http://www.icourse163.org/course/BIT-268001' 
         ,  
         'class' 
         : [ 
         'py1' 
         ],  
         'id' 
         :  
         'link1' 
         } 
        
 
         [ 
         'py1' 
         ] 
        
 
         http: 
         / 
         / 
         www.icourse163.org 
         / 
         course 
         / 
         BIT 
         - 
         268001 
        
 
         < 
         class 
         'dict' 
         > 
        
 
         < 
         class 
         'bs4.element.Tag' 
         > 
        
 
     
 
   

Tag的NavigableString

Tag的NavigableString 。

。

基本元素	说明
NavigableString	标签内非属性字符串，<>…</>中字符串，格式：.string

。

Tag的Comment 。

。

基本元素	说明
Comment	标签内字符串的注释部分，一种特殊的Comment类型

。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         newsoup  
         = 
         BeautifulSoup( 
         "<b><!--This is a comment--></b><p>This is not a comment</p>" 
         , 
         "html.parser" 
         ) 
        
         print 
         (newsoup.b.string) 
        
         print 
         ( 
         type 
         (newsoup.b.string)) 
        
         print 
         (newsoup.p.string) 
        
         print 
         ( 
         type 
         (newsoup.p.string))

 
    ? 
   
         This  
         is 
         a comment 
        
         < 
         class 
         'bs4.element.Comment' 
         > 
        
         This  
         is 
         not 
         a comment 
        
         < 
         class 
         'bs4.element.NavigableString' 
         >

HTML基本格式

标签树的下行遍历

。

属性	说明
.contents	子节点的列表，将所有儿子结点存入列表
.children	子节点的迭代类型，与.contents类似，用于循环遍历儿子结点
.descendents	子孙节点的迭代类型，包含所有子孙节点，用于循环遍历

。

BeautifulSoup类型是标签树的根节点。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         print 
         (soup.head) 
        
         print 
         (soup.head.contents) 
        
         print 
         (soup.body.contents) 
        
         print 
         ( 
         len 
         (soup.body.contents)) 
        
         print 
         (soup.body.contents[ 
         1 
         ])

 
    ? 
   
 
     
       
       
         <head><title>This  
         is 
         a python demo page< 
         / 
         title>< 
         / 
         head> 
        
 
         [<title>This  
         is 
         a python demo page< 
         / 
         title>] 
        
 
         [ 
         '\n' 
         , <p ><b>The demo python introduces several python courses.< 
         / 
         b>< 
         / 
         p>,  
         '\n' 
         , <p >Python  
        
 
         is 
         a wonderful general 
         - 
         purpose programming language. You can learn Python  
         from 
         novice to professional by tracking the  
        
 
         following courses: 
        
 
         <a  href 
         = 
         "http://www.icourse163.org/course/BIT-268001" 
         >Basic Python< 
         / 
         a>  
         and 
         <a  href 
         = 
         "http://www.icourse163.org/course/BIT-1001870001" 
         >Advanced Python< 
         / 
         a>.< 
         / 
         p>,  
         '\n' 
         ] 
        
 
         5 
        
 
         <p ><b>The demo python introduces several python courses.< 
         / 
         b>< 
         / 
         p> 
        
 
     
 
   

 
    ? 
   
         for 
         child  
         in 
         soup.body.children: 
        
         print 
         (child)   
         #遍历儿子结点 
        
         for 
         child  
         in 
         soup.body.descendants: 
        
         print 
         (child)  
         #遍历子孙节点

标签树的上行遍历

。

属性	说明
.parent	节点的父亲标签
.parents	节点先辈标签的迭代类型，用于循环遍历先辈节点

。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         print 
         (soup.title.parent) 
        
         print 
         (soup.html.parent)

 
    ? 
   
 
     
       
       
         < 
         head 
         >< 
         title 
         >This is a python demo page</ 
         title 
         ></ 
         head 
         > 
        
 
         < 
         html 
         >< 
         head 
         >< 
         title 
         >This is a python demo page</ 
         title 
         ></ 
         head 
         > 
        
 
         < 
         body 
         > 
        
 
         < 
         p 
         >< 
         b 
         >The demo python introduces several python courses.</ 
         b 
         ></ 
         p 
         > 
        
 
         < 
         p 
         >Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses: 
        
 
         < 
         a  
         href 
         = 
         "http://www.icourse163.org/course/BIT-268001" 
         >Basic Python</ 
         a 
         > and < 
         a  
         href 
         = 
         "http://www.icourse163.org/course/BIT-1001870001" 
         >Advanced Python</ 
         a 
         >.</ 
         p 
         > 
        
 
         </ 
         body 
         ></ 
         html 
         > 
        
 
     
 
   

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         for 
         parent  
         in 
         soup.a.parents: 
        
         if 
         parent  
         is 
         None 
         : 
        
         print 
         (parent) 
        
         else 
         : 
        
         print 
         (parent.name)

标签的平行遍历

属性	说明
.next_sibling	返回按照HTML文本顺序的下一个平行节点标签
.previous.sibling	返回按照HTML文本顺序的上一个平行节点标签
.next_siblings	迭代类型，返回按照HTML文本顺序的后续所有平行节点标签
.previous.siblings	迭代类型，返回按照HTML文本顺序的前续所有平行节点标签

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         print 
         (soup.a.next_sibling) 
        
         print 
         (soup.a.next_sibling.next_sibling) 
        
         print 
         (soup.a.previous_sibling) 
        
         print 
         (soup.a.previous_sibling.previous_sibling) 
        
         print 
         (soup.a.parent)

 
    ? 
   
 
     
       
       
         and 
        
 
         <a  
         class 
         = 
         "py2" 
         href 
         = 
         "http://www.icourse163.org/course/BIT-1001870001" 
         id 
         = 
         "link2" 
         >Advanced Python< 
         / 
         a> 
        
 
         Python  
         is 
         a wonderful general 
         - 
         purpose programming language. You can learn Python  
         from 
         novice to professional by tracking the following courses: 
        

            
        
 
         None 
        
 
         <p  
         class 
         = 
         "course" 
         >Python  
         is 
         a wonderful general 
         - 
         purpose programming language. You can learn Python  
         from 
         novice to professional by tracking the following courses: 
        
 
         <a  
         class 
         = 
         "py1" 
         href 
         = 
         "http://www.icourse163.org/course/BIT-268001" 
         id 
         = 
         "link1" 
         >Basic Python< 
         / 
         a>  
         and 
         <a  
         class 
         = 
         "py2" 
         href 
         = 
         "http://www.icourse163.org/course/BIT-1001870001" 
         id 
         = 
         "link2" 
         >Advanced Python< 
         / 
         a>.< 
         / 
         p> 
        
 
     
 
   

 
    ? 
   
         for 
         sibling  
         in 
         soup.a.next_sibling: 
        
         print 
         (sibling)   
         #遍历后续节点 
        
         for 
         sibling  
         in 
         soup.a.previous_sibling: 
        
         print 
         (sibling)   
         #遍历前续节点

python beautiful soup库入门安装教程

bs库的prettify()方法

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         r  
         = 
         requests.get( 
         "http://python123.io/ws/demo.html" 
         ) 
        
         demo  
         = 
         r.text 
        
         soup  
         = 
         BeautifulSoup(demo, 
         "html.parser" 
         ) 
        
         print 
         (soup.prettify())

 
    ? 
   
         < 
         html 
         > 
        
         < 
         head 
         > 
        
         < 
         title 
         > 
        
         This is a python demo page 
        
         </ 
         title 
         > 
        
         </ 
         head 
         > 
        
         < 
         body 
         > 
        
         < 
         p 
         class 
         = 
         "title" 
         > 
        
         < 
         b 
         > 
        
         The demo python introduces several python courses. 
        
         </ 
         b 
         > 
        
         </ 
         p 
         > 
        
         < 
         p 
         class 
         = 
         "course" 
         > 
        
         Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses: 
        
         Basic Python 
        
         </ 
         a 
         > 
        
         and 
        
         < 
         a 
         class 
         = 
         "py2" 
         href 
         = 
         "http://www.icourse163.org/course/BIT-1001870001" 
         id 
         = 
         "link2" 
         > 
        
         Advanced Python 
        
         </ 
         a 
         > 
        
         . 
        
         </ 
         p 
         > 
        
         </ 
         body 
         > 
        
         </ 
         html 
         >

.prettify()为HTML文本<>及其内容增加更加'\n' .prettify()可用于标签，方法：.prettify() 。

bs4库的编码

bs4库将任何HTML输入都变成utf-8编码 python 3.x默认支持编码是utf-8,解析无障碍。

 
    ? 
   
         import 
         requests 
        
         from 
         bs4  
         import 
         BeautifulSoup 
        
         soup  
         = 
         BeautifulSoup( 
         "<p>中文</p>" 
         , 
         "html.parser" 
         ) 
        
         print 
         (soup.p.string) 
        
         print 
         (soup.p.prettify())

到此这篇关于python beautiful soup库入门安装教程的文章就介绍到这了,更多相关python beautiful soup库入门内容请搜索我以前的文章或继续浏览下面的相关文章希望大家以后多多支持我！。

原文链接：https://blog.csdn.net/weixin_46530492/article/details/119960182 。

最后此篇关于python beautiful soup库入门安装教程的文章就讲到这里了,如果你想了解更多关于python beautiful soup库入门安装教程的内容请搜索CFSDN的文章或继续浏览相关文章，希望大家以后支持我的博客！。

文章推荐： Python BeautifulSoup基本用法详解(通过标签及class定位元素)

文章推荐： Java中Iterator与ListIterator迭代的区别

文章推荐： Java-Java5.0注解全面解读

文章推荐： Spring-AOP 静态正则表达式方法如何匹配切面

python - Beautiful Soup 4 find_all 找不到 Beautiful Soup 3 找到的链接
我注意到一个非常烦人的错误:BeautifulSoup4(包:bs4)经常发现比以前版本(包:BeautifulSoup)更少的标签。这是该问题的一个可重现的实例: import requests
Python Beautiful Soup解析具有特定ID的表
我正在尝试从具有我所知道的特定ID的表中获取数据。由于某种原因，该代码不断给我“无”结果。我正在尝试从HTML代码中解析: שווי שוק (אלפי ש"ח)
python beautiful soup元内容标签
我正在尝试从包含以下 HTML 的网站中提取价格: $ 29.99 我正在使用以下 Beautiful Soup 代码: book_prices = soup_pack
python - beautiful Soup中python响应报错如何继续
我做了一个网络爬虫，它从一个文本文件中获取数千个 Urls，然后爬取该网页上的数据。现在它有很多网址；一些网址也被破坏了。所以它给了我错误: Traceback (most recent call
网站的Python正确编码(Beautiful Soup)
我正在尝试加载 html 页面并输出文本，即使我正确获取网页，BeautifulSoup 以某种方式破坏了编码。来源: # -*- coding: utf-8 -*- import requests
python beautiful soup库入门安装教程
目录 beautiful soup库的安装 beautiful soup库的理解 beautiful soup库的引用 BeautifulSoup类
面向新手解析python Beautiful Soup基本用法
Beautiful Soup就是Python的一个HTML或XML的解析库，可以用它来方便地从网页中提取数据。它有如下三个特点： Beautiful Soup提供一些简单的、Python式的
526. Beautiful Arrangement 优美的排列
题目地址：https://leetcode.com/problems/beautiful-arrangement/description/ 题目描述 Suppose you have N inte
932. Beautiful Array 漂亮数组
题目地址：https://leetcode.com/problems/beautiful-array/description/ 题目描述 Forsome fixed N, an array A i
Python Beautiful Soup find_all
您好，我正在尝试从网站获取一些信息。请原谅我，如果我的格式有任何错误，这是我第一次发布到 SO。 soup.find('div', {"class":"stars"}) 从这里我收到我需要 “
python - Beautiful Soup 选择谷歌图像返回空列表
我想从 Google Arts & Culture 检索信息使用 BeautifulSoup。我检查了许多 stackoverflow 帖子( [1] ， [2] , [3] , [4] , [5]
Python -- Beautiful Soup -- 如果标签为空或有值则返回信息
我决定学习 Python，因为我现在有更多时间(由于大流行)并且一直在自学 Python。我试图从一个网站上刮取税率，几乎可以获得我需要的一切。下面是来自我的 Soup 变量以及相关 Python
python - 从页面中获取所有链接 Beautiful Soup
我正在使用 beautifulsoup 从页面中获取所有链接。我的代码是: import requests from bs4 import BeautifulSoup url = 'http://ww
reactjs - 使用react-beautiful-dnd获取DragHandle错误
我正在使用react-beautiful-dnd版本8.0.5(最新)并尝试渲染可重组列表，但我不断收到此错误: Warning: React.createElement: type is inval
javascript - 组件不会掉落在相邻列中react-beautiful-dnd
我在将组件放入应用程序的下一个列表区域时遇到困难。我可以在父列中完美地拖放和排序，但无法将组件放在其他地方。这是我的 onDragEnd 函数中的代码: onDragEnd = result =>
javascript - 无法在同一个可放置的react-beautiful-dnd中拖放组件
发生的情况是，当我在一列中有多个项目并尝试拖动其中一个时，只显示一个项目，并且根据发现的经验教训 here我应该处于可以移动同一列内的项目但不能移动的位置。在 React 开发工具中，state 和
python - Beautiful Soup 根据部分属性值查找标签
我正在尝试根据部分属性值来识别 html 文档中的标签。例如，如果我有一个 Beautifulsoup 对象: import bs4 as BeautifulSoup r = requests.ge
python - Beautiful Soup 查找具有多个类的元素
Показать телефон 如何在 Beautiful Soup 中找到上述元素？我尝试了以下方法，但没有奏效: show = soup.find('div', {'class': 'acti
python - beautiful soup 通过指定两件事在表中查找链接
我如何获得结果网址:https://www.sec.gov/Archives/edgar/data/1633917/000163391718000094/0001633917-18-000094-in
Python Beautiful Soup 使用类解析表
我是 python 新手，尝试从页面中提取表格，但无法使用 BS4 找到该表格。你能告诉我我哪里出错了吗？ import requests from bs4 import BeautifulSoup

qq735679552

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城