gpt4 book ai didi

python - XPath:通过当前节点属性选择当前和下一个节点的文本

转载 作者:行者123 更新时间:2023-12-03 16:21:59 29 4
gpt4 key购买 nike

首先,这是来自 my previous question 的衍生品。 .我再次发布此内容是因为 the person whose answer I accepted in the original post 建议我这样做因为他觉得这个问题以前没有正确定义。尝试2:

我正在尝试从 this webpage 中获取信息.为清楚起见,以下是 选择 页面源的 block :

<p class="titlestyle">ANT101H5 Introduction to Biological Anthropology and Archaeology 
<span class='distribution'>(SCI)</span></p>
<span class='normaltext'>
Anthropology is the global and holistic study of human biology and behaviour, and includes four subfields: biological anthropology, archaeology, sociocultural anthropology and linguistics. The material covered is directed to answering the question: What makes us human? This course is a survey of biological anthropology and archaeology. [<span class='Helpcourse'
onMouseover="showtip(this,event,'24 Lectures')"
onMouseout="hidetip()">24L</span>, <span class='Helpcourse'
onMouseover="showtip(this,event,'12 Tutorials')"
onMouseout="hidetip()">12T</span>]<br>
<span class='title2'>Exclusion: </span><a href='javascript:OpenCourse("WEBCOURSENOTFOUND.html")'>ANT100Y5</a><br>
<span class='title2'>Prerequisite: </span><a href='javascript:OpenCourse("WEBCOURSEANT102H5.pl?fv=1")'>ANT102H5</a><br>



从上面的示例 block 中,我想提取以下信息:
  • ANT101H5 Introduction to Biological Anthropology and Archaeology
  • Exclusion: ANT100Y5
  • Prerequisite: ANT102H5

  • 我想从网页上获取所有此类信息(请记住,某些类(class)可能还有额外列出的“共同要求”,或者可能根本没有列出任何先决条件/共同要求或排除项)。

    我一直在尝试为此任务编写一个适当的 xpath 表达式,但我似乎无法做到恰到好处。

    到目前为止,在 Dimitre Novatchev 的帮助下,我已经能够使用以下表达式:
    sites = hxs.select("(//p[@class='titlestyle'])[2]/text()[1] | (//span[@class='title2'])[2]/text() | \
    (//span[@class='title2'])[2]/following-sibling::a[1]/text() | (//span[@class='title2'])[3]/text() | \
    (//span[@class='title2'])[3]/following-sibling::a[1]/text()")

    但是,它会产生以下输出,似乎只获得了 的信息。第一个 页面上的类(class):
    [{"desc": "ANT101H5 Introduction to Biological Anthropology and Archaeology \n                        "},
    {"desc": "Exclusion: "},
    {"desc": "ANT100Y5"},
    {"desc": "Prerequisite: "},
    {"desc": "ANT102H5"}]

    绝对清楚,这个输出只有在它获得关于第一门类(class)的正确信息的情况下才是正确的。我需要这样的正确信息 全部 该网页上列出的类(class)。

    我是如此接近,但我似乎无法弄清楚最后一步。

    我会很感激任何帮助...在此先感谢

    最佳答案

    为所有类(class)选择相关数据所需的单个 XPath 表达式相当困惑 ,所以在这里我采用另一种方法,可以使用(如果有必要的话)生成单个 XPath 表达式:

    这个简单的 XSLT 转换 :

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="p[@class='titlestyle']">
    <xsl:text>&#xA;===================&#xA;</xsl:text>
    <xsl:value-of select="text()[1]"/>
    </xsl:template>

    <xsl:template match=
    "span/span[@class='title2'][not(position() >1)]">
    <xsl:text>&#xA;</xsl:text>
    <xsl:value-of select="."/>
    <xsl:value-of select="following-sibling::a[1]"/>

    <xsl:if test="not(following-sibling::a)">
    <xsl:value-of select="following-sibling::text()[1]"/>
    </xsl:if>
    <xsl:text>&#xA;</xsl:text>
    </xsl:template>
    <xsl:template match="text()"/>
    </xsl:stylesheet>

    的页面上应用时: http://www.utm.utoronto.ca/regcal/WEBLISTCOURSES1.html (整理成格式良好的 XML 文档), 产生想要的结果 :
    ===================
    Anthropology
    ===================
    ANT101H5 Introduction to Biological Anthropology and Archaeology

    Exclusion: ANT100Y5

    ===================
    ANT102H5 Introduction to Sociocultural and Linguistic Anthropology

    Exclusion: ANT100Y5

    ===================
    ANT200Y5 World Archaeology and Prehistory

    Prerequisite: 101H5

    ===================
    ANT203Y5 Biological Anthropology

    Prerequisite: 101H5

    ===================
    ANT204Y5 Sociocultural Anthropology

    Prerequisite: 101H5

    ===================
    ANT205H5 Introduction to Forensic Anthropology

    Prerequisite: 101H5

    ===================
    ANT206Y5 Culture and Communication: Introduction to Linguistic Anthropology

    Exclusion: ANT206H5

    ===================
    ANT241Y5 Aboriginal Peoples of North America

    ===================
    ANT299Y5 Research Opportunity Program

    ===================
    ANT304H5 Anthropology and Aboriginal Peoples

    Exclusion: ANT304Y5

    ===================
    ANT306H5 Forensic Anthropology Field School

    Prerequisite: ANT205H5

    ===================
    ANT308H5 Case Studies in Archaeological Botany and Zoology

    Prerequisite: ANT200Y5

    ===================
    ANT309H5 Southeast Asian Archaeology

    Prerequisite: ANT200Y5

    ===================
    ANT310H5 Complex Societies

    Prerequisite: ANT200Y5

    ===================
    ANT312H5 Archaeological Analysis

    Prerequisite: ANT200Y5

    ===================
    ANT313H5 China, Korea and Japan in Prehistory

    Prerequisite: ANT200Y5

    ===================
    ANT314H5 Archaeological Theory

    Exclusion: ANT411H5

    ===================
    ANT316H5 South Asian Archaeology

    Prerequisite: ANT200Y5

    ===================
    ANT317H5 Archaeology of Eastern North America

    Prerequisite: ANT200Y5

    ===================
    ANT318H5 Archaeological Fieldwork

    Prerequisite: ANT200Y5

    ===================
    ANT320H5 Archaeological Approaches to Technology

    Prerequisite: ANT200Y5

    ===================
    ANT322H5 Anthropology of Youth Culture

    Exclusion: ANT204Y5

    ===================
    ANT327H5 Agricultural Origins: The Second Revolution

    Prerequisite: ANT200Y5

    ===================
    ANT331H5 The Biology of Human Sexuality

    Exclusion: ANT330H5

    ===================
    ANT332H5 Human Origins

    Exclusion: ANT332Y5

    ===================
    ANT333H5 Human Origins II

    Exclusion: ANT332Y5

    ===================
    ANT334H5 Human Osteology

    Exclusion: ANT334Y5

    ===================
    ANT335H5 Anthropology of Gender

    Exclusion: ANT331Y5

    ===================
    ANT336H5 Molecular Anthropology

    Prerequisite: ANT203Y5

    ===================
    ANT338H5 Laboratory Methods in Biological Anthropology

    Prerequisite: ANT203Y5

    ===================
    ANT339Y5 Human Adaptation through Biological and Cultural Means

    Prerequisite: ANT203Y5

    ===================
    ANT340H5 Osteological Theory

    Exclusion: ANT334Y5

    ===================
    ANT350H5 Globalization and the Changing World of Work

    Prerequisite: ANT204Y5

    ===================
    ANT351H5 Money, Markets, Gifts: Topics in Economic Anthropology

    Prerequisite: ANT204Y5

    ===================
    ANT352H5 Power, Authority, and Legitimacy: Topics in Political Anthropology

    Prerequisite: ANT204Y5

    ===================
    ANT358H5 Ethnographic Methods

    Prerequisite: ANT204Y5

    ===================
    ANT360H5 Anthropology of Religion

    Exclusion: ANT209Y5

    ===================
    ANT361H5 Anthropology of Sub-Saharan Africa

    Exclusion: ANT212Y5

    ===================
    ANT362H5 Language in Culture and Society

    Prerequisite: ANT204Y5

    ===================
    ANT363H5 Magic, Witchcraft and Science

    Prerequisite: ANT360H5

    ===================
    ANT364H5 Lab in Social Interaction

    Prerequisite: ANT206H5

    ===================
    ANT365H5 Semiotic Anthropology

    Prerequisite: ANT204Y5

    ===================
    ANT368H5 World Religions and Ecology

    Exclusion: RLG311H5

    ===================
    ANT369H5 Religious Violence and Nonviolence

    Exclusion: RLG317H5

    ===================
    ANT397H5 Independent Study

    Prerequisite: Permission of Faculty Advisor


    ===================
    ANT398Y5 Independent Reading

    Prerequisite: Permission of Faculty Advisor


    ===================
    ANT399Y5 Research Opportunity Program

    Prerequisite: P.I.


    ===================
    ANT401H5 Vocal and Visual Communication

    Prerequisite: ANT102H5

    ===================
    ANT414H5 People and Plants in Prehistory

    Prerequisite: ANT200Y5

    ===================
    ANT415H5 Faunal Archaeo-Osteology

    Exclusion: ANT415Y5

    ===================
    ANT416H5 Advanced Archaeological Analysis

    Prerequisite: ANT312H5

    ===================
    ANT418H5 Advanced Archaeological Fieldwork

    Prerequisite: ANT318H5

    ===================
    ANT430H5 Special Problems in Biological Anthropology and Archaeology

    Prerequisite: P.I


    ===================
    ANT430Y5 Special Problems in Biological Anthropology and Archaeology

    Prerequisite: P.I.


    ===================
    ANT431Y5 Special Problems in Sociocultural or Linguistic Anthropology

    Prerequisite: P.I.


    ===================
    ANT431H5 Special Problems in Sociocultural or Linguistic Anthropology

    Prerequisite: P.I.


    ===================
    ANT432H5 Special Seminar in Anthropology

    Prerequisite: P.I.


    ===================
    ANT433H5 Genes, Language, Artifact and Mind

    Prerequisite: ANT200Y5

    ===================
    ANT434H5 Palaeopathology

    Prerequisite: ANT334Y5

    ===================
    ANT438H5 The Development of Thought in Biological Anthropology

    Prerequisite: ANT203Y5

    ===================
    ANT439Y5 Advanced Forensic Anthropology

    Prerequisite: ANT205H5

    ===================
    ANT441H5 Advanced Bioarchaeology

    Prerequisite: ANT334H5

    ===================
    ANT457H5 Anthropology and the Environment

    Prerequisite: ANT102H5

    ===================
    ANT458H5 Anthropology of Crime, Law and Order

    Exclusion: ANT204Y5

    ===================
    ANT459H5 The Ethnography of Speaking

    Prerequisite: ANT206Y5

    ===================
    ANT460H5 Theory in Sociocultural Anthropology

    Prerequisite: ANT204Y5

    ===================
    ANT461H5 Emergent Topics in Socio-Cultural &amp; Linguistic Anthropology

    Prerequisite: ANT204Y5

    ===================
    ANT498H5 Advanced Independent Study

    Prerequisite: P.I.


    ===================
    ANT499Y5 Advanced Independent Research

    Prerequisite: P.I.

    关于python - XPath:通过当前节点属性选择当前和下一个节点的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5208843/

    29 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com