gpt4 book ai didi

python - 计算div标签的平均高度和平均宽度

转载 作者:太空宇宙 更新时间:2023-11-03 17:20:59 24 4
gpt4 key购买 nike

我需要获取 html 文档的平均 div 高度和宽度。

我已经尝试过这个解决方案,但它不起作用:

import numpy as np
average_width = np.mean([div.attrs['width'] for div in my_doc.get_div() if 'width' in div.attrs])
average_height = np.mean([div.attrs['height'] for div in my_doc.get_div() if 'height' in div.attrs])
print average_height,average_width

get_div方法返回 find_all 检索到的所有 div 的列表beautifulSoup的方法

这是一个例子:

print my_doc.get_div()[1]

<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>

当我获得属性时,它完美地工作

print my_doc.get_div()[1].attrs

{u'style': u'position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;'}

但是当我尝试获取值时

print my_doc.get_div()[1].attrs['width']

我收到错误:

KeyError: 'width'

但我不明白,因为当我检查类型时:

print type(my_doc.get_div()[1].attrs)

这是一本字典,<type 'dict'>

最佳答案

可能有更好的方法-

方式-1

下面是我测试的代码,用于提取宽度高度

from bs4 import BeautifulSoup

html_doc = '''<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>'''

soup = BeautifulSoup(html_doc,'html.parser')
my_att = [i.attrs['style'] for i in soup.find_all("div")]
dd = ''.join(my_att).split(";")
dd_cln= filter(None, dd)
dd_cln= [i.strip() for i in dd_cln ]
my_dict = dict(i.split(':') for i in dd_cln)
print my_dict['width']

方式2使用正则表达式,如here所述.

工作代码-

import numpy as np
import re
from bs4 import BeautifulSoup

html_doc = '''<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:45px; top:81px; width:127px; height:9px;">
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">Journal of Infection (2015)
</span>
<span style="font-family: EICMDB+AdvTrebu-B; font-size:8px">xx</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">, 1</span>
<span style="font-family: EICMDD+AdvPS44A44B; font-size:7px">e</span>
<span style="font-family: EICMDA+AdvTrebu-R; font-size:8px">4
<br/>
</span>
</div>'''

soup = BeautifulSoup(html_doc,'html.parser')
my_att = [i.attrs['style'] for i in soup.find_all("div")]
css = ''.join(my_att)
print css
width_list = map(float,re.findall(r'(?<=width:)(\d+)(?=px;)', css))
height_list = map(float,re.findall(r'(?<=height:)(\d+)(?=px;)', css))
print np.mean(height_list)
print np.mean(width_list)

关于python - 计算div标签的平均高度和平均宽度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33151188/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com