gpt4 book ai didi

xpath选择有条件的节点

转载 作者:行者123 更新时间:2023-12-03 17:03:15 24 4
gpt4 key购买 nike

请使用 Scrapy 一个基于 python 的框架来抓取一个站点,但我不知道如何使用类 value ellipsis ph 选择文本.有时,类有一个强大的标签。到目前为止,我已经成功提取了没有 strong 子标签的文本.

<div class="right">
<div class="attrs">
<div class="attr">
<span class="name">Main Products:</span>
<div class="value ellipsis ph">
// Here below i needed to select it ignoring the strong tag
<strong>Shoes</strong>
(Sport
<strong>Shoes</strong>
,Casual
<strong>Shoes</strong>
,Hiking
<strong>Shoes</strong>
,Skate
<strong>Shoes</strong>
,Football
<strong>Shoes</strong>
)
</div>
</div>
</div>
</div>


<div class="right">
<div class="attrs">
<div class="attr">
<span class="name">Main Products:</span>
<div class="value ellipsis ph">
Cap, Shoe, Bag // could select this

</div>
</div>
</div>
</div>

从所选节点的根开始,这里是有效的。只选择没有强标签的文本。
"/div[@class='right']/div[@class='attrs']/div[@class='attr']/div/text()").extract()

最佳答案

假设您想要 div 的文本表示具有类 value ellipsis ph 的元素, 你可以:

  • 要么选择所有后代文本节点,而不仅仅是子节点,使用 .//text()
  • 或在 div 上使用 XPath 的字符串函数元素

  • 以下是 2 个选项:
    >>> selector = scrapy.Selector(text="""<div class="right">
    ... <div class="attrs">
    ... <div class="attr">
    ... <span class="name">Main Products:</span>
    ... <div class="value ellipsis ph">
    ... <!-- // Here below i needed to select it ignoring the strong tag -->
    ... <strong>Shoes</strong>
    ... (Sport
    ... <strong>Shoes</strong>
    ... ,Casual
    ... <strong>Shoes</strong>
    ... ,Hiking
    ... <strong>Shoes</strong>
    ... ,Skate
    ... <strong>Shoes</strong>
    ... ,Football
    ... <strong>Shoes</strong>
    ... )
    ... </div>
    ... </div>
    ... </div>
    ... </div>
    ...
    ...
    ... <div class="right">
    ... <div class="attrs">
    ... <div class="attr">
    ... <span class="name">Main Products:</span>
    ... <div class="value ellipsis ph">
    ... Cap, Shoe, Bag <!-- // could select this -->
    ...
    ... </div>
    ... </div>
    ... </div>
    ... </div>""")
    >>> for div in selector.css('div.value.ellipsis.ph'):
    ... print "---"
    ... print "".join(div.xpath('.//text()').extract())
    ...
    ---


    Shoes
    (Sport
    Shoes
    ,Casual
    Shoes
    ,Hiking
    Shoes
    ,Skate
    Shoes
    ,Football
    Shoes
    )

    ---

    Cap, Shoe, Bag


    >>> for div in selector.css('div.value.ellipsis.ph'):
    ... print "---"
    ... print div.xpath('string()').extract_first()
    ...
    ---


    Shoes
    (Sport
    Shoes
    ,Casual
    Shoes
    ,Hiking
    Shoes
    ,Skate
    Shoes
    ,Football
    Shoes
    )

    ---

    Cap, Shoe, Bag


    >>>

    关于xpath选择有条件的节点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31337675/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com