作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用Python中的BeautifulSoup函数从HTML中提取一些文本(更具体地说,它是Mdx,一种#dictionary文件)-它运行良好,但我没有得到我需要的东西。 # 我的代码如下:
from bs4 import BeautifulSoup
from lxml import etree
html = '''
abandon <link href="LM5style_vanilla.css" rel="stylesheet" type="text/css" /><link href="LM5style.css" rel="stylesheet" type="text/css" /><link href="LM5style_switch.css" rel="stylesheet" type="text/css" /><link href="LM5style_show.css" rel="stylesheet" type="text/css" /><script src="jquery-3.2.1.min.js" charset="utf-8" type="text/javascript" language="javascript"></script><script src="LM5Switch.js" charset="utf-8" type="text/javascript" language="javascript"></script><span class="lm5ppbody"><div class="entry_content"><h1 class="pagetitle" pagetype="0">abandon</h1><div class="dictionary"><div class="wordfams"><span class="LDOCE5pp_sensefold foldsign_fold"><span class="asset_intro">Word family</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span><span class="LDOCE_word_family" style="display:none;"> <span class="pos">noun</span> <span class="w" title="abandonment">abandonment</span> <span class="pos">adjective</span> <a class="crossRef w" href="bword://abandoned" title="abandoned">abandoned</a> <span class="pos">verb</span> <span class="w" title="abandon">abandon</span> </span></div><!-- End of DIV wordfams--><span class="dictentry"><span class="dictionary_intro span"><span class="lm5ppMenu"><span id="lm5ppMenu_logo"> </span><span class="lm5ppMenu_title"><span class="en_title">Longman Dictionary of Contemporary English 5++</span><span class="cn_title"><span class="cn_txt_menu">朗文当代英语 5++</span></span></span><span class="lm5ppMenu_title mini"><span class="en_title">LDOCE 5++</span><span class="cn_title"><span class="cn_txt_menu">朗文 5++</span></span></span></span></span><span class="dictlink"><a name="abandon__entry_0__a"></a><span class="ldoceEntry Entry" id="abandon__entry_0"><span class="frequent Head"><span class="HWD">a<span class="HYP"><span class="HYP">·</span></span>ban<span class="HYP"><span class="HYP">·</span></span>don</span><span class="HOMNUM">1</span><a class="PronCodes" href="sound://media/english/ameProns/abandon1.mp3"><span class="neutral span"> /</span><span class="PRON">əˈbændən</span><span class="neutral span">/</span></a> <span class="tooltip LEVEL" title="Core vocabulary: Medium-frequency"> ●●○</span> <span class="FREQ" title="Top 3000 written words">W3</span> <span class="AC" title="Academic Word list">AWL</span><span class="lm5pp_POS"> verb</span><span class="GRAM"><span class="neutral span"> [</span>transitive<span class="neutral span">]</span></span><a class="speaker brefile fa fa-volume-up" data-src-mp3="/media/english/breProns/abandon_v0205.mp3" href="sound://media/english/breProns/abandon_v0205.mp3" title="Play British pronunciation of abandon"> </a><a class="speaker amefile fa fa-volume-up" data-src-mp3="/media/english/ameProns/abandon1.mp3" href="sound://media/english/ameProns/abandon1.mp3" title="Play American pronunciation of abandon"> </a></span><a name="abandon__1__a"></a><span class="newline Sense" id="abandon__1"><span class="LDOCE5pp_sensefold"><span class="sensenum span">1</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span> <span class="ACTIV">LEAVE A RELATIONSHIP</span><span class="DEF LDOCE_switch_lang switch_siblings">to leave someone, especially someone you are <a class="defRef" href="bword://responsible" title="responsible">responsible</a> for</span><span class="DEF LDOCE_switch_lang switch_siblings"> <span class="cn_txt"> 抛弃,遗弃〔某人〕</span></span><span class="RELATEDWD"><span class="neutral span"> → </span><a href="bword://abandoned"> abandoned</a></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963493.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">How could she abandon her own child?<span class="cn_txt"> 她怎么能抛弃自己的孩子呢?</span></span></span></span><a name="abandon__2__a"></a><span class="newline Sense" id="abandon__2"><span class="LDOCE5pp_sensefold"><span class="sensenum span">2</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span> <span class="ACTIV">LEAVE A PLACE</span><span class="DEF LDOCE_switch_lang switch_siblings">to go away from a place, <a class="defRef" href="bword://vehicle" title="vehicle">vehicle</a> etc permanently, especially because the situation makes it <a class="defRef" href="bword://impossible" title="impossible">impossible</a> for you to stay</span><span class="DEF LDOCE_switch_lang switch_siblings"> <span class="cn_txt"> 离弃,逃离〔某地方、交通工具等〕</span></span><span class="SYN"> <span class="synopp span">SYN</span><a href="bword://leave"> leave</a></span><span class="RELATEDWD"><span class="neutral span">, → </span><a href="bword://abandoned"> abandoned</a></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963497.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">We had to abandon the car and walk the rest of the way.<span class="cn_txt"> 我们只好弃车,步行走完剩下的路。</span></span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963498.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">Fearing further attacks, most of the population had abandoned the city.<span class="cn_txt"> 因为害怕还要受到袭击,大多数市民已逃离该市。</span></span></span></span><a name="abandon__3__a"></a><span class="newline Sense" id="abandon__3"><span class="LDOCE5pp_sensefold"><span class="sensenum span">3</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span> <span class="ACTIV">STOP DOING something</span><span class="DEF LDOCE_switch_lang switch_siblings">to stop doing something because there are too many problems and it is impossible to continue</span><span class="DEF LDOCE_switch_lang switch_siblings"> <span class="cn_txt"> 放弃,中止</span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963502.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">The game had to be abandoned due to bad weather.<span class="cn_txt"> 由于天气不好,比赛不得不中止。</span></span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-001732862.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">They <span class="COLLOINEXA">abandoned</span> their <span class="COLLOINEXA">attempt</span> to recapture the castle.<span class="cn_txt"> 他们放弃了夺回城堡的努力。</span></span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-001776706.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">Because of the fog they <span class="COLLOINEXA">abandoned</span> their <span class="COLLOINEXA"<span>someone, </span><span>you </span></div></div>\n</span>\n
'''
soup = BeautifulSoup(html, 'lxml')
context = soup.find_all(class_="english LDOCE_switch_lang switch_children")
print(context)
#this is what it runs:[<span class="english LDOCE_switch_lang switch_children">How could she abandon her own child?<span class="cn_txt"> 她怎么能抛弃自己的孩子呢?</span></span>, <span class="english LDOCE_switch_lang switch_children">We had to abandon the car and walk the rest of the way.<span class="cn_txt"> 我们只好弃车,步行走完剩下的路。</span></span>, <span class="english LDOCE_switch_lang switch_children">Fearing further attacks, most of the population had abandoned the city.<span class="cn_txt"> 因为害怕还要受到袭击,大多数市民已逃离该市。</span></span>,
我需要的是所有的英文和中文样本,如下所示:
How could she abandon her own child?
她怎么能抛弃自己的孩子呢?
我已经尝试了好几天了。请帮我。非常感谢!
最佳答案
我希望我正确理解你的问题。如果你想提取英文短语和中文对应项,你可以使用这个例子(我不懂中文,所以我无法验证这是否是正确的输出):
from bs4 import BeautifulSoup
html = '''
abandon <link href="LM5style_vanilla.css" rel="stylesheet" type="text/css" /><link href="LM5style.css" rel="stylesheet" type="text/css" /><link href="LM5style_switch.css" rel="stylesheet" type="text/css" /><link href="LM5style_show.css" rel="stylesheet" type="text/css" /><script src="jquery-3.2.1.min.js" charset="utf-8" type="text/javascript" language="javascript"></script><script src="LM5Switch.js" charset="utf-8" type="text/javascript" language="javascript"></script><span class="lm5ppbody"><div class="entry_content"><h1 class="pagetitle" pagetype="0">abandon</h1><div class="dictionary"><div class="wordfams"><span class="LDOCE5pp_sensefold foldsign_fold"><span class="asset_intro">Word family</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span><span class="LDOCE_word_family" style="display:none;"> <span class="pos">noun</span> <span class="w" title="abandonment">abandonment</span> <span class="pos">adjective</span> <a class="crossRef w" href="bword://abandoned" title="abandoned">abandoned</a> <span class="pos">verb</span> <span class="w" title="abandon">abandon</span> </span></div><!-- End of DIV wordfams--><span class="dictentry"><span class="dictionary_intro span"><span class="lm5ppMenu"><span id="lm5ppMenu_logo"> </span><span class="lm5ppMenu_title"><span class="en_title">Longman Dictionary of Contemporary English 5++</span><span class="cn_title"><span class="cn_txt_menu">朗文当代英语 5++</span></span></span><span class="lm5ppMenu_title mini"><span class="en_title">LDOCE 5++</span><span class="cn_title"><span class="cn_txt_menu">朗文 5++</span></span></span></span></span><span class="dictlink"><a name="abandon__entry_0__a"></a><span class="ldoceEntry Entry" id="abandon__entry_0"><span class="frequent Head"><span class="HWD">a<span class="HYP"><span class="HYP">·</span></span>ban<span class="HYP"><span class="HYP">·</span></span>don</span><span class="HOMNUM">1</span><a class="PronCodes" href="sound://media/english/ameProns/abandon1.mp3"><span class="neutral span"> /</span><span class="PRON">əˈbændən</span><span class="neutral span">/</span></a> <span class="tooltip LEVEL" title="Core vocabulary: Medium-frequency"> ●●○</span> <span class="FREQ" title="Top 3000 written words">W3</span> <span class="AC" title="Academic Word list">AWL</span><span class="lm5pp_POS"> verb</span><span class="GRAM"><span class="neutral span"> [</span>transitive<span class="neutral span">]</span></span><a class="speaker brefile fa fa-volume-up" data-src-mp3="/media/english/breProns/abandon_v0205.mp3" href="sound://media/english/breProns/abandon_v0205.mp3" title="Play British pronunciation of abandon"> </a><a class="speaker amefile fa fa-volume-up" data-src-mp3="/media/english/ameProns/abandon1.mp3" href="sound://media/english/ameProns/abandon1.mp3" title="Play American pronunciation of abandon"> </a></span><a name="abandon__1__a"></a><span class="newline Sense" id="abandon__1"><span class="LDOCE5pp_sensefold"><span class="sensenum span">1</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span> <span class="ACTIV">LEAVE A RELATIONSHIP</span><span class="DEF LDOCE_switch_lang switch_siblings">to leave someone, especially someone you are <a class="defRef" href="bword://responsible" title="responsible">responsible</a> for</span><span class="DEF LDOCE_switch_lang switch_siblings"> <span class="cn_txt"> 抛弃,遗弃〔某人〕</span></span><span class="RELATEDWD"><span class="neutral span"> → </span><a href="bword://abandoned"> abandoned</a></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963493.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">How could she abandon her own child?<span class="cn_txt"> 她怎么能抛弃自己的孩子呢?</span></span></span></span><a name="abandon__2__a"></a><span class="newline Sense" id="abandon__2"><span class="LDOCE5pp_sensefold"><span class="sensenum span">2</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span> <span class="ACTIV">LEAVE A PLACE</span><span class="DEF LDOCE_switch_lang switch_siblings">to go away from a place, <a class="defRef" href="bword://vehicle" title="vehicle">vehicle</a> etc permanently, especially because the situation makes it <a class="defRef" href="bword://impossible" title="impossible">impossible</a> for you to stay</span><span class="DEF LDOCE_switch_lang switch_siblings"> <span class="cn_txt"> 离弃,逃离〔某地方、交通工具等〕</span></span><span class="SYN"> <span class="synopp span">SYN</span><a href="bword://leave"> leave</a></span><span class="RELATEDWD"><span class="neutral span">, → </span><a href="bword://abandoned"> abandoned</a></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963497.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">We had to abandon the car and walk the rest of the way.<span class="cn_txt"> 我们只好弃车,步行走完剩下的路。</span></span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963498.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">Fearing further attacks, most of the population had abandoned the city.<span class="cn_txt"> 因为害怕还要受到袭击,大多数市民已逃离该市。</span></span></span></span><a name="abandon__3__a"></a><span class="newline Sense" id="abandon__3"><span class="LDOCE5pp_sensefold"><span class="sensenum span">3</span><span class="foldsign"><span class="foldblank"> </span><span class="foldsignbar1"></span><span class="foldsignbar2"></span></span></span> <span class="ACTIV">STOP DOING something</span><span class="DEF LDOCE_switch_lang switch_siblings">to stop doing something because there are too many problems and it is impossible to continue</span><span class="DEF LDOCE_switch_lang switch_siblings"> <span class="cn_txt"> 放弃,中止</span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-000963502.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">The game had to be abandoned due to bad weather.<span class="cn_txt"> 由于天气不好,比赛不得不中止。</span></span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-001732862.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">They <span class="COLLOINEXA">abandoned</span> their <span class="COLLOINEXA">attempt</span> to recapture the castle.<span class="cn_txt"> 他们放弃了夺回城堡的努力。</span></span></span><span class="EXAMPLE"><a class="speaker exafile fa fa-volume-up" href="sound://media/english/exaProns/p008-001776706.mp3" title="Play Example"> </a><span class="english LDOCE_switch_lang switch_children">Because of the fog they <span class="COLLOINEXA">abandoned</span> their <span class="COLLOINEXA"<span>someone, </span><span>you </span></div></div>\n</span>\n
'''
soup = BeautifulSoup(html, 'lxml')
print('{:^80} {:^80}'.format('English', 'Chinese'))
print('-' * 160)
for english in soup.select('.english:has(.cn_txt)'):
cn_txt = english.select_one('.cn_txt').get_text(strip=True)
english.select_one('.cn_txt').extract()
eng_txt = english.get_text(separator=' ', strip=True)
print('{:<80} {:<80}'.format(eng_txt, cn_txt))
打印:
English Chinese
----------------------------------------------------------------------------------------------------------------------------------------------------------------
How could she abandon her own child? 她怎么能抛弃自己的孩子呢?
We had to abandon the car and walk the rest of the way. 我们只好弃车,步行走完剩下的路。
Fearing further attacks, most of the population had abandoned the city. 因为害怕还要受到袭击,大多数市民已逃离该市。
The game had to be abandoned due to bad weather. 由于天气不好,比赛不得不中止。
They abandoned their attempt to recapture the castle. 他们放弃了夺回城堡的努力。
关于python - 如何在Python中使用BeautifulSoup从html中提取特定文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59348483/
我是一名优秀的程序员,十分优秀!