gpt4 book ai didi

xml - 使用 XSLT 将复杂的 XML 转换为 TSV

转载 作者:数据小太阳 更新时间:2023-10-29 02:53:23 27 4
gpt4 key购买 nike

我发现之前的几个问题解决了我的部分问题(参见 herehere ,但我在整合它们时遇到了问题。我有一组 XML 记录,我想将其转换为选项卡-分隔格式。但是,并非所有 XML 记录都包含所有字段,有些记录包含一个字段的多个实例。

两个示例 XML 记录:

<?xml version="1.0" encoding="UTF-8" ?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<marc:record>
<marc:leader>02179 am a 002893u </marc:leader>
<marc:controlfield tag="001">12789</marc:controlfield>
<marc:controlfield tag="005">20120521</marc:controlfield>
<marc:controlfield tag="007">cuuuu---auuuu</marc:controlfield>
<marc:controlfield tag="008">120521s|||| xx o 0 u ||| |</marc:controlfield>
<marc:datafield tag="020" ind1=" " ind2=" ">
<marc:subfield code="a">9789089640574</marc:subfield>
</marc:datafield>
<marc:datafield tag="100" ind1="1" ind2=" ">
<marc:subfield code="a">Rooij van ,Robert</marc:subfield>
<marc:subfield code="4">aut</marc:subfield>
</marc:datafield>
<marc:datafield tag="245" ind1="1" ind2=" ">
<marc:subfield code="a">New Perspectives on Games and Interaction</marc:subfield>
</marc:datafield>
<marc:datafield tag="260" ind1=" " ind2=" ">
<marc:subfield code="b">Amsterdam University Press</marc:subfield>
<marc:subfield code="c">2008</marc:subfield>
</marc:datafield>
<marc:datafield tag="300" ind1=" " ind2=" ">
<marc:subfield code="a">1 electronic resource (330 p.)</marc:subfield>
</marc:datafield>
<marc:datafield tag="520" ind1=" " ind2=" ">
<marc:subfield code="a">This volume is a collection of papers ...</marc:subfield>
</marc:datafield>
<marc:datafield tag="650" ind1=" " ind2="0">
<marc:subfield code="a">Mathematics</marc:subfield>
</marc:datafield>
<marc:datafield tag="650" ind1=" " ind2="0">
<marc:subfield code="a">Philosophy (General)</marc:subfield>
</marc:datafield>
<marc:datafield tag="650" ind1=" " ind2="0">
<marc:subfield code="a">Economic theory. Demography</marc:subfield>
</marc:datafield>
<marc:datafield tag="653" ind1=" " ind2=" ">
<marc:subfield code="a">Economics</marc:subfield>
</marc:datafield>
<marc:datafield tag="653" ind1=" " ind2=" ">
<marc:subfield code="a">Philosophy</marc:subfield>
</marc:datafield>
<marc:datafield tag="653" ind1=" " ind2=" ">
<marc:subfield code="a">Mathematics</marc:subfield>
</marc:datafield>
<marc:datafield tag="653" ind1=" " ind2=" ">
<marc:subfield code="a">Economie</marc:subfield>
</marc:datafield>
<marc:datafield tag="653" ind1=" " ind2=" ">
<marc:subfield code="a">Filosofie</marc:subfield>
</marc:datafield>
<marc:datafield tag="653" ind1=" " ind2=" ">
<marc:subfield code="a">Wiskunde</marc:subfield>
</marc:datafield>
<marc:datafield tag="700" ind1="1" ind2=" ">
<marc:subfield code="a">Apt ,Krzysztof</marc:subfield>
<marc:subfield code="4">aut</marc:subfield>
</marc:datafield>
<marc:datafield tag="856" ind1="4" ind2="0">
<marc:subfield code="u">http://www.doabooks.org/doab?func=fulltext&amp;rid=12789</marc:subfield>
<marc:subfield code="z">Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc)</marc:subfield>
</marc:datafield>
<marc:datafield tag="856" ind1="4" ind2="0">
<marc:subfield code="u">http://www.oapen.org/download?type=document&amp;docid=340074</marc:subfield>
</marc:datafield>
</marc:record>
<marc:record>
<marc:leader>01452 am a 001933u </marc:leader>
<marc:controlfield tag="001">15497</marc:controlfield>
<marc:controlfield tag="005">20140217</marc:controlfield>
<marc:controlfield tag="007">cuuuu---auuuu</marc:controlfield>
<marc:controlfield tag="008">140217s|||| xx o 0 u ||| |</marc:controlfield>
<marc:datafield tag="020" ind1=" " ind2=" ">
<marc:subfield code="a">9788867050673</marc:subfield>
</marc:datafield>
<marc:datafield tag="100" ind1="1" ind2=" ">
<marc:subfield code="a">Emanuele Haus</marc:subfield>
<marc:subfield code="4">aut</marc:subfield>
</marc:datafield>
<marc:datafield tag="245" ind1="1" ind2=" ">
<marc:subfield code="a">Dynamics of an elastic satellite with internal friction.</marc:subfield>
</marc:datafield>
<marc:datafield tag="260" ind1=" " ind2=" ">
<marc:subfield code="b">Ledizioni - LediPublishing</marc:subfield>
<marc:subfield code="c">2013</marc:subfield>
</marc:datafield>
<marc:datafield tag="300" ind1=" " ind2=" ">
<marc:subfield code="a">1 electronic resource ( p.)</marc:subfield>
</marc:datafield>
<marc:datafield tag="520" ind1=" " ind2=" ">
<marc:subfield code="a">n this thesis, we study the dynamics...</marc:subfield>
</marc:datafield>
<marc:datafield tag="546" ind1=" " ind2=" ">
<marc:subfield code="a">english</marc:subfield>
</marc:datafield>
<marc:datafield tag="650" ind1=" " ind2="0">
<marc:subfield code="a">Mathematics</marc:subfield>
</marc:datafield>
<marc:datafield tag="856" ind1="4" ind2="0">
<marc:subfield code="u">http://www.doabooks.org/doab?func=fulltext&amp;rid=15497</marc:subfield>
<marc:subfield code="z">Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa)</marc:subfield>
</marc:datafield>
<marc:datafield tag="856" ind1="4" ind2="0">
<marc:subfield code="u">http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf</marc:subfield>
</marc:datafield>
</marc:record>
</marc:collection>

我一直在尝试根据这个 previous answer 改编 XSLT ,到目前为止运气不佳:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.loc.gov/MARC21/slim">
<xsl:output method="text"/>
<xsl:variable name="delimiter" select="'&#09;'"/>

<xsl:strip-space elements="*"/>
<xsl:output method="text"/>

<xsl:key name="field"
match="/collection/record/datafield/subfield"
use="concat(../@tag,@code)"/>

<!-- variable containing the first occurrence of each field -->
<xsl:variable name="allFields"
select="/collection/record/datafield/subfield
[generate-id()
=generate-id(key('field',
concat(../@tag,@code))[1])]" />

<xsl:template match="/">

<xsl:for-each select="$allFields">
<xsl:sort select="substring(concat(../@tag,@code),1,3)"
data-type="number"/>
<xsl:value-of select="concat(../@tag,@code)" />
<xsl:if test="position() &lt; last()">
<xsl:value-of select="$delimiter" />
</xsl:if>
</xsl:for-each>
<xsl:text>&#10;</xsl:text>
<xsl:apply-templates select="*/*" />
</xsl:template>

<xsl:template match="*">
<xsl:variable name="this" select="." />

<xsl:for-each select="$allFields">
<xsl:sort
select="substring(concat(../@tag,@code),1,3)"
data-type="number"/>
<xsl:value-of
select="$this/*[@code = current()/@code]" />
<xsl:if test="position() &lt; last()">
<xsl:value-of select="$delimiter" />
</xsl:if>
</xsl:for-each>
<xsl:text>&#10;</xsl:text>
</xsl:template>
</xsl:stylesheet>

在我试图实现的输出中, header 将由 leader 后跟 @tag 的唯一值组成(与 subfield/@code 用于子字段),按tag 升序排序:

leader  001 005 007 008 020a    100a    1004    245a    260b    260c    300a    520a    546a    650a    653a    700a    7004    856u    856z

如果一条记录对于单个 field/subfield 组合有多个值,我想将它们连接在一起,例如:

653a
Economics|Philosophy|Mathematics

但是,如果记录缺少特定字段,我只想输出一个制表符,以保持所有内容对齐。

完整样本 TSV 输出:

leader  001 005 007 008 020a    100a    1004    245a    260b    260c    300a    520a    546a    650a    653a    700a    7004    856u    856z                                        
02179 am a 002893u 12789 20120521 cuuuu---auuuu 120521s|||| xx o 0 u ||| | 9789089640574 Rooij van ,Robert aut New Perspectives on Games and Interaction Amsterdam University Press 2008 1 electronic resource (330 p.) This volume is a collection of papers Mathematics|Philosophy (General)|Economic theory. Demography Economics|Philosophy|Mathematics|Economie|Filosofie|Wiskunde Apt ,Krzysztof< aut http://www.doabooks.org/doab?func=fulltext&amp;rid=12789|http://www.oapen.org/download?type=document&amp;docid=340074 Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc)
01452 am a 001933u 15497 20140217 cuuuu---auuuu 140217s|||| xx o 0 u ||| | 9788867050673 Emanuele Haus aut Dynamics of an elastic satellite with internal friction. Ledizioni - LediPublishing 2013 1 electronic resource ( p.) In this thesis, we study the dynamics of an elastic body english Mathematics http://www.doabooks.org/doab?func=fulltext&amp;rid=15497|http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa)

最佳答案

我建议你这样试试:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:marc="http://www.loc.gov/MARC21/slim"
exclude-result-prefixes="marc">
<xsl:output method="text" encoding="UTF-8"/>

<xsl:variable name="fields">
<xsl:for-each-group select="/marc:collection/marc:record/marc:datafield" group-by="@tag">
<xsl:sort select="@tag"/>
<xsl:for-each select="marc:subfield">
<xsl:sort/>
<field tag="{current-grouping-key()}" code="{@code}">a</field>
</xsl:for-each>
</xsl:for-each-group>
</xsl:variable>

<xsl:template match="/">
<!-- header -->
<xsl:for-each select="$fields/field">
<xsl:value-of select="@tag"/>
<xsl:value-of select="@code"/>
<xsl:if test="position()!=last()">
<xsl:text>&#9;</xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>&#10;</xsl:text>
<!-- data -->
<xsl:for-each select="marc:collection/marc:record">
<xsl:variable name="current-record" select="." />
<xsl:for-each select="$fields/field">
<xsl:value-of select="$current-record/marc:datafield[@tag=current()/@tag]/marc:subfield[@code=current()/@code]" separator="|"/>
<xsl:if test="position()!=last()">
<xsl:text>&#9;</xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:if test="position()!=last()">
<xsl:text>&#10;</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

应用于示例输入时的结果:

020a    100a    1004    245a    260c    260b    300a    520a    546a    650a    653a    700a    7004    856z    856u
9789089640574 Rooij van ,Robert aut New Perspectives on Games and Interaction 2008 Amsterdam University Press 1 electronic resource (330 p.) This volume is a collection of papers ... Mathematics|Philosophy (General)|Economic theory. Demography Economics|Philosophy|Mathematics|Economie|Filosofie|Wiskunde Apt ,Krzysztof aut Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc) http://www.doabooks.org/doab?func=fulltext&rid=12789|http://www.oapen.org/download?type=document&docid=340074
9788867050673 Emanuele Haus aut Dynamics of an elastic satellite with internal friction. 2013 Ledizioni - LediPublishing 1 electronic resource ( p.) n this thesis, we study the dynamics... english Mathematics Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa) http://www.doabooks.org/doab?func=fulltext&rid=15497|http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf

注意:我无法弄清楚“领导者”在输入或输出中的作用。

关于xml - 使用 XSLT 将复杂的 XML 转换为 TSV,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27319143/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com