gpt4 book ai didi

python - 使用 BeautifulSoup 删除第一个子节点

转载 作者:太空狗 更新时间:2023-10-29 14:14:27 28 4
gpt4 key购买 nike

import os
from bs4 import BeautifulSoup
do = dir_with_original_files = 'C:\FOLDER'
dm = dir_with_modified_files = 'C:\FOLDER'
for root, dirs, files in os.walk(do):
for f in files:
print f.title()
if f.endswith('~'): #you don't want to process backups
continue
original_file = os.path.join(root, f)
mf = f.split('.')
mf = ''.join(mf[:-1])+'_mod.'+mf[-1] # you can keep the same name
# if you omit the last two lines.
# They are in separate directories
# anyway. In that case, mf = f
modified_file = os.path.join(dm, mf)
with open(original_file, 'r') as orig_f, \
open(modified_file, 'w') as modi_f:
soup = BeautifulSoup(orig_f.read())

for t in soup.find_all('table'):
for child in t.find_all("table"):#*****this is fine for now, but how would I restrict it to find only the first element?
child.REMOVE() #******PROBLEM HERE********

# This is where you create your new modified file.
modi_f.write(soup.prettify().encode(soup.original_encoding))

大家好

我正在尝试使用 BeautifulSoup 对文件进行一些解析,以稍微清理它们。我想要的功能是我想删除表中任意位置的第一个表,例如:

<table>
<tr>
<td></td
</tr>
<tr>
<td><table></table><-----This will be deleted</td
</tr>
<tr>
<td><table></table> --- this will remain here.</td
</tr>
</table>

目前,我的代码设置为查找一个表中的所有表,并且我编写了一个 .REMOVE() 方法来显示我希望完成的任务。我怎样才能真正删除这个元素?

Tl;dr -

  • 如何调整我的代码以仅查找文件。

  • 如何删除此表?

最佳答案

找到表里面的第一个表,调用extract()在上面:

inner_table = soup.find('table').find('table')  # or just soup.table.table
inner_table.extract()

关于python - 使用 BeautifulSoup 删除第一个子节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27319284/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com