gpt4 book ai didi

python - 如何选择和提取两个元素之间的文本?

转载 作者:行者123 更新时间:2023-11-28 21:40:09 25 4
gpt4 key购买 nike

我正在尝试抓取 this使用 scrapy 的网站。页面结构如下所示:

<div class="list">
<a id="follows" name="follows"></a>
<h4 class="li_group">Follows</h4>
<div class="soda odd"><a href="...">Star Trek</a></div>
<div class="soda even"><a href="...</a></div>
<div class="soda odd"><a href="..">Star Trek: The Motion Picture</a></div>
<div class="soda even"><a href="..">Star Trek II: The Wrath of Khan</a></div>
<div class="soda odd"><a href="..">Star Trek III: The Search for Spock</a></div>
<div class="soda even"><a href="..">Star Trek IV: The Voyage Home</a></div>
<a id="followed_by" name="followed_by"></a>
<h4 class="li_group">Followed by</h4>
<div class="soda odd"><a href="..">Star Trek V: The Final Frontier</a></div>
<div class="soda even"><a href="..">Star Trek VI: The Undiscovered Country</a></div>
<div class="soda odd"><a href="..">Star Trek: Deep Space Nine</a></div>
<div class="soda even"><a href="..">Star Trek: Generations</a></div>
<div class="soda odd"><a href="..">Star Trek: Voyager</a></div>
<div class="soda even"><a href="..">First Contact</a></div>
<a id="spin_off" name="spin_off"></a>
<h4 class="li_group">Spin-off</h4>
<div class="soda odd"><a href="..">Star Trek: The Next Generation - The Transinium Challenge</a></div>
<div class="soda even"><a href="..">A Night with Troi</a></div>
<div class="soda odd"><a href="..">Star Trek: Deep Space Nine</a></div
</div>

我想选择和提取之间的文本:<h4 class="li_group">Follows</h4><h4 class="li_group">Followed by</h4>然后在 <h4 class="li_group">Followed by</h4> 之间发短信和 <h4 class="li_group">Spin-off</h4>
我试过这段代码:

def parse(self, response):
for sel in response.css("div.list"):
item = ImdbcoItem()
item['Follows'] = sel.css("a#follows+h4.li_group ~ div a::text").extract(),
item['Followed_by'] = sel.css("a#vfollowed_by+h4.li_group ~ div a::text").extract(),
item['Spin_off'] = sel.css("a#spin_off+h4.li_group ~ div a::text").extract(),
return item

但这第一项提取所有 div 而不仅仅是 <h4 class="li_group">Follows</h4> 之间的 div和 <h4 class="li_group">Followed by</h4>
任何帮助都会非常有帮助!!

最佳答案

你可以尝试使用下面的XPath表达式来获取

  • “关注” block 的所有文本节点:

    //div[./preceding-sibling::h4[1]="Follows"]//text()
  • “Followed by” block 的所有文本节点:

    //div[./preceding-sibling::h4[1]="Followed by"]//text()
  • “分拆” block 的所有文本节点:

    //div[./preceding-sibling::h4[1]="Spin-off"]//text()

关于python - 如何选择和提取两个元素之间的文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45957062/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com