gpt4 book ai didi

python - getallAttributes() 的正确方法是什么

转载 作者:行者123 更新时间:2023-12-01 04:42:26 26 4
gpt4 key购买 nike

我正在尝试读取给定元素的属性(属性)。我想提取所有属性名称-值对的字典。

我目前正在做的是使用正则表达式并列出所有属性值。但这里的问题是,它只显示属性的值而不是名称:

attributes = node.xpath("@*")
print attributes
print len(attributes)
for att in attributes:
print att

示例输出如下所示:

<Selector xpath='@*' data=u'1'>
<Selector xpath='@*' data=u'2761554'>
<Selector xpath='@*' data=u'1431756540503'>

任何人都可以建议一种列出元素所有属性的方法吗?
我将其与 python/scrapy 一起使用。

最佳答案

使用 XPath,您可以使用 name() 并将属性作为参数。

  1. 计数元素的属性count(@*)
  2. 对于每个属性位置,使用 @*[position] 提取名称和值

meta 元素的 scrapy shell session 示例:

$ scrapy shell "https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes"
2015-05-18 10:47:28+0200 [default] DEBUG: Crawled (200) <GET https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f732bf4b190>
[s] item {}
[s] request <GET https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes>
[s] response <200 https://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes>
[s] settings <scrapy.settings.Settings object at 0x7f732bf3ffd0>
[s] spider <DefaultSpider 'default' at 0x7f73268eead0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser

In [1]: import pprint
In [2]: for meta in response.xpath('//meta[@*]'):
...: nbattr = int(float(meta.xpath('count(@*)').extract()[0]))
...: pprint.pprint(dict((meta.xpath('name(@*[%d])' % i).extract()[0], meta.xpath('@*[%d]' % i).extract()[0]) for i in range(1, nbattr+1)))
...: print
...:
{u'content': u'summary', u'name': u'twitter:card'}

{u'content': u'stackoverflow.com', u'name': u'twitter:domain'}

{u'content': u'website', u'property': u'og:type'}

{u'content': u'https://cdn.sstatic.net/stackoverflow/img/apple-touch-icon@2.png?v=ea71a5211a91&a',
u'itemprop': u'image primaryImageOfPage',
u'property': u'og:image'}

{u'content': u'what would be the right way of doing getallAttributes()',
u'itemprop': u'title name',
u'name': u'twitter:title',
u'property': u'og:title'}

{u'content': u'I am trying to read the property(attributes) of given element .\n\nI want to extract the Dictionary of all the attributes name-value pair..\n\nwhat i am currently doing is i am using regex and listting...',
u'itemprop': u'description',
u'name': u'twitter:description',
u'property': u'og:description'}

{u'content': u'http://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
u'property': u'og:url'}

{u'content': u'US', u'name': u'twitter:app:country'}

{u'content': u'Stack Exchange iOS', u'name': u'twitter:app:name:iphone'}

{u'content': u'871299723', u'name': u'twitter:app:id:iphone'}

{u'content': u'se-zaphod://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
u'name': u'twitter:app:url:iphone'}

{u'content': u'Stack Exchange iOS', u'name': u'twitter:app:name:ipad'}

{u'content': u'871299723', u'name': u'twitter:app:id:ipad'}

{u'content': u'se-zaphod://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
u'name': u'twitter:app:url:ipad'}

{u'content': u'Stack Exchange Android',
u'name': u'twitter:app:name:googleplay'}

{u'content': u'http://stackoverflow.com/questions/30295249/what-would-be-the-right-way-of-doing-getallattributes',
u'name': u'twitter:app:url:googleplay'}

{u'content': u'com.stackexchange.marvin',
u'name': u'twitter:app:id:googleplay'}

我使用了类似的技术in this blog发布以提取微数据:

>>> for item in selector.xpath('.//*[@itemscope]'):
... print "Item:", item.xpath('@itemtype').extract()
... for property in item.xpath('.//*[@itemprop]'):
... print "Property:",
... print property.xpath('@itemprop').extract(),
... print property.xpath('string(.)').extract()
... for position, attribute in enumerate(property.xpath('@*'), start=1):
... print "attribute: name=%s; value=%s" % (
... property.xpath('name(@*[%d])' % position).extract(),
... attribute.extract())
... print
... print
...
Item: [u'http://schema.org/Movie']
Property: [u'name'] [u'Avatar']
attribute: name=[u'itemprop']; value=name

Property: [u'director'] [u'n Director: James Cameron n(born August 16, 1954)n ']
attribute: name=[u'itemprop']; value=director
attribute: name=[u'itemscope']; value=
attribute: name=[u'itemtype']; value=http://schema.org/Person

Property: [u'name'] [u'James Cameron']
attribute: name=[u'itemprop']; value=name

Property: [u'birthDate'] [u'August 16, 1954']
attribute: name=[u'itemprop']; value=birthDate
attribute: name=[u'datetime']; value=1954-08-16

Property: [u'genre'] [u'Science fiction']
attribute: name=[u'itemprop']; value=genre

Property: [u'trailer'] [u'Trailer']
attribute: name=[u'href']; value=../movies/avatar-theatrical-trailer.html
attribute: name=[u'itemprop']; value=trailer


Item: [u'http://schema.org/Person']
Property: [u'name'] [u'James Cameron']
attribute: name=[u'itemprop']; value=name

Property: [u'birthDate'] [u'August 16, 1954']
attribute: name=[u'itemprop']; value=birthDate
attribute: name=[u'datetime']; value=1954-08-16

>>>

关于python - getallAttributes() 的正确方法是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30295249/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com