gpt4 book ai didi

xml - XSLT 中的字频计数器

转载 作者:行者123 更新时间:2023-12-04 04:45:25 26 4
gpt4 key购买 nike

我正在尝试在 XSLT 中制作一个词频计数器。我希望它使用停用词。我开始使用 Michael Kay's book .但是我很难让停用词起作用。

此代码适用于任何源 XML 文件。

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
<xsl:variable name="stopwords" select="'a about an are as at be by for from how I in is it of on or that the this to was what when where who will with'"/>
<wordcount>
<xsl:for-each-group group-by="." select="
for $w in //text()/tokenize(., '\W+')[not(.=$stopwords)] return $w">
<word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
</xsl:for-each-group>
</wordcount>
</xsl:template>

</xsl:stylesheet>

我认为 not(.=$stopwords)是我的问题所在。但我不知道该怎么办。

此外,我将提示如何从外部文件加载停用词。

最佳答案

您的 $stopwords 变量现在是一个字符串;你希望它是一个字符串序列。您可以通过以下任一方式执行此操作:

  • 将其声明更改为
    <xsl:variable name="stopwords" 
    select="('a', 'about', 'an', 'are', 'as', 'at',
    'be', 'by', 'for', 'from', 'how',
    'I', 'in', 'is', 'it',
    'of', 'on', 'or',
    'that', 'the', 'this', 'to',
    'was', 'what', 'when', 'where',
    'who', 'will', 'with')"/>
  • 将其声明更改为
    <xsl:variable name="stopwords" 
    select="tokenize('a about an are as at
    be by for from how I in is it
    of on or that the this to was
    what when where who will with',
    '\s+')"/>
  • 从名为(例如)stoplist.xml 的外部 XML 文档中读取它,格式为
    <stop-list>
    <p>This is a sample stop list [further description ...]</p>
    <w>a</w>
    <w>about</w>
    ...
    </stop-list>

    然后加载它,例如与
    <xsl:variable name="stopwords"
    select="document('stopwords.xml')//w/string()"/>
  • 关于xml - XSLT 中的字频计数器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18280028/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com