gpt4 book ai didi

python - 表.decompose() : AttributeError: 'str' object has no attribute 'decompose'

转载 作者:太空宇宙 更新时间:2023-11-03 19:53:55 25 4
gpt4 key购买 nike

我正在尝试使用 BeautifulSoup 来解析 html 文档。我试图编写一个代码来解析文档,找到所有表格并删除那些具有数字/字母数字比例 > 15%。我使用了给出的代码作为上一个问题的答案:

Delete HTML element if it contains a certain amount of numeric characters

但由于某种原因,table.decompose() 参数被标记为错误。如果我能得到任何帮助,我将不胜感激。请注意,我是初学者,因此,尽管我确实尝试过,但我并不总是理解更复杂的解决方案!

这是代码:

test_file = 'locationoftestfile.html'


# Define a function to remove tables which have numeric characters/ alphabetic and numeric characters > 15%
def remove_table(table):
table = re.sub('<[^>]*>', ' ', str(table))
numeric = sum(c.isdigit() for c in table)
print('numeric: ' + str(numeric))
alphabetic = sum(c.isalpha() for c in table)
print('alpha: ' + str(alphabetic))
try:
ratio = numeric / float(numeric + alphabetic)
print('ratio: '+ str(ratio))
except ZeroDivisionError as err:
ratio = 1
if ratio > 0.15:
table.decompose()


# Define a function to create our Soup object and then extract text
def file_to_text(file):
soup_file = open(file, 'r')
soup = BeautifulSoup(soup_file, 'html.parser')
for table in soup.find_all('table'):
remove_table(table)
text = soup.get_text()
return text


file_to_text(test_file)

这是我收到的输出/错误:

numeric: 1
alpha: 55
ratio: 0.017857142857142856
numeric: 9
alpha: 88
ratio: 0.09278350515463918
numeric: 20
alpha: 84
ratio: 0.19230769230769232
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-28-c7e380df4fdc> in <module>
----> 1 file_to_text(test_file)

<ipython-input-27-9fb65cec1313> in file_to_text(file)
16 ratio = 1
17 if ratio > 0.15:
---> 18 table.decompose()
19 text = soup.get_text()
20 return text

AttributeError: 'str' object has no attribute 'decompose'

请注意,table.decompose() 参数与我链接的解决方案中给出的参数不同。该解决方案使用

   return True
else:
return False

但是,也许天真地,我不明白这将如何删除表格。

最佳答案

table = re.sub('<[^>]*>', ' ', str(table))

这会用字符串覆盖参数“table”。您可能想在此处为变量使用另一个名称。例如

def remove_table(table):
table_as_str = re.sub('<[^>]*>', ' ', str(table))
numeric = sum(c.isdigit() for c in table_as_str)
print('numeric: ' + str(numeric))
alphabetic = sum(c.isalpha() for c in table_as_str)
print('alpha: ' + str(alphabetic))
try:
ratio = numeric / float(numeric + alphabetic)
print('ratio: '+ str(ratio))
except ZeroDivisionError as err:
ratio = 1
if ratio > 0.15:
table.decompose()

关于python - 表.decompose() : AttributeError: 'str' object has no attribute 'decompose' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59653000/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com