gpt4 book ai didi

python-3.x - 如何在输出中删除 BeautifulSoup 中的 "\n\r\n "

转载 作者:行者123 更新时间:2023-12-05 06:30:58 25 4
gpt4 key购买 nike

我有这样的代码

from bs4 import BeautifulSoup
import requests
import re

page = open('doc1.html','rb').read()
soup = BeautifulSoup(page,'lxml')
# print(soup.prettify())

# eng = soup.find_all(string = re.compile("righteou"))
# print(eng)

# heb = soup.findAll('p',{'dir':'RTL'})
# print(heb)
list=[]
all_tr =soup.findAll('tr')
for td in all_tr:
all_td = soup.findAll('td')
d={
'hob':all_td[0].text.strip(),
'english':all_td[1].text.strip()

}
list.append(d)
print(list)

我的输出是这样的

[{'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n                    We need to understand\r\n                    \r\n      
the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּיתְּנָה הַתּוֹרָה עַל הַר סִינַי דַּוְקָא,', 'english': '\n\r\n We need to understand\r\n \r\n the idea that the Torah was given specifically on Mount\r\n Sinai,\r\n '}, {'hob': 'עִנְיָן שֶׁנִּ...................................................................................................................................................................................................................................................

我想从输出 Suh 中删除\n\t 我的文件将是 cleaarr ..我该怎么做??????

最佳答案

拆分单词并用空格连接它们。

'english':" ".join(all_td[1].text.split())

这将删除所有“\n”、“\r”、“”。

关于python-3.x - 如何在输出中删除 BeautifulSoup 中的 "\n\r\n ",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51967574/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com