gpt4 book ai didi

python - 使用 BeautifulSoup 从未关闭的特定元标记中提取内容

转载 作者:太空狗 更新时间:2023-10-29 20:33:03 25 4
gpt4 key购买 nike

我正在尝试从特定的元标记中解析出内容。这是元标记的结构。前两个以反斜杠结束,但其余的没有任何结束标签。一旦我获得第三个元标记,<head> 之间的全部内容返回标签。我也试过 soup.findAll(text=re.compile('keyword'))但这不会返回任何内容,因为关键字是元标记的属性。

<meta name="csrf-param" content="authenticity_token"/>
<meta name="csrf-token" content="OrpXIt/y9zdAFHWzJXY2EccDi1zNSucxcCOu8+6Mc9c="/>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'>
<meta content='en_US' http-equiv='Content-Language'>
<meta content='c2y_K2CiLmGeet7GUQc9e3RVGp_gCOxUC4IdJg_RBVo' name='google-site- verification'>
<meta content='initial-scale=1.0,maximum-scale=1.0,width=device-width' name='viewport'>
<meta content='notranslate' name='google'>
<meta content="Learn about Uber's product, founders, investors and team. Everyone's Private Driver - Request a car from any mobile phone—text message, iPhone and Android apps. Within minutes, a professional driver in a sleek black car will arrive curbside. Automatically charged to your credit card on file, tip included." name='description'>

代码如下:

import csv
import re
import sys
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

req3 = Request("https://angel.co/uber", headers={'User-Agent': 'Mozilla/5.0')
page3 = urlopen(req3).read()
soup3 = BeautifulSoup(page3)

## This returns the entire web page since the META tags are not closed
desc = soup3.findAll(attrs={"name":"description"})

最佳答案

编辑:按照@Albert Chen 的建议添加了区分大小写的正则表达式。

Python 3 编辑:

from bs4 import BeautifulSoup
import re
import urllib.request

page3 = urllib.request.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)

desc = soup3.findAll(attrs={"name": re.compile(r"description", re.I)})
print(desc[0]['content'])

虽然我不确定它是否适用于每个页面:

from bs4 import BeautifulSoup
import re
import urllib

page3 = urllib.urlopen("https://angel.co/uber").read()
soup3 = BeautifulSoup(page3)

desc = soup3.findAll(attrs={"name": re.compile(r"description", re.I)})
print(desc[0]['content'].encode('utf-8'))

产量:

Learn about Uber's product, founders, investors and team. Everyone's Private Dri
ver - Request a car from any mobile phoneΓÇötext message, iPhone and Android app
s. Within minutes, a professional driver in a sleek black car will arrive curbsi
de. Automatically charged to your credit card on file, tip included.

关于python - 使用 BeautifulSoup 从未关闭的特定元标记中提取内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18134318/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com