topic-modeling - Mallet主题模型示例无法编译-6ren

topic-modeling - Mallet主题模型示例无法编译

转载作者：行者123 更新时间：2023-12-04 11:05:55

25

4

我想在我的 Java 中编译 mallet(而不是使用命令行)，所以我将 jar 包含在我的项目中，并引用了来自以下示例的代码:http://mallet.cs.umass.edu/topics-devel.php ，但是，当我运行此代码时，出现以下错误:

Exception in thread "main" java.lang.NoClassDefFoundError: gnu/trove/TObjectIntHashMap
    at cc.mallet.types.Alphabet.<init>(Alphabet.java:51)
    at cc.mallet.types.Alphabet.<init>(Alphabet.java:70)
    at cc.mallet.pipe.TokenSequence2FeatureSequence.<init>    (TokenSequence2FeatureSequence.java:35)
at mallet.TopicModel.main(TopicModel.java:25)
Caused by: java.lang.ClassNotFoundException: gnu.trove.TObjectIntHashMap
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 4 more

我不确定是什么导致了错误。有人可以帮忙吗？

package mallet;

import cc.mallet.util.*;
import cc.mallet.types.*;
import cc.mallet.pipe.*;
import cc.mallet.pipe.iterator.*;
import cc.mallet.topics.*;

import java.util.*;
import java.util.regex.*;
import java.io.*;

public class TopicModel {

public static void main(String[] args) throws Exception {

    String filePath = "D:/ap.txt";
    // Begin by importing documents from text to feature sequences
    ArrayList<Pipe> pipeList = new ArrayList<Pipe>();

    // Pipes: lowercase, tokenize, remove stopwords, map to features
    pipeList.add( new CharSequenceLowercase() );
    pipeList.add( new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) );
    pipeList.add( new TokenSequenceRemoveStopwords(new File("stoplists/en.txt"), "UTF-8", false, false, false) );
    pipeList.add( new TokenSequence2FeatureSequence() );

    InstanceList instances = new InstanceList (new SerialPipes(pipeList));

    Reader fileReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8");
    instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
                                           3, 2, 1)); // data, label, name fields

    // Create a model with 100 topics, alpha_t = 0.01, beta_w = 0.01
    //  Note that the first parameter is passed as the sum over topics, while
    //  the second is 
    int numTopics = 100;
    ParallelTopicModel model = new ParallelTopicModel(numTopics, 1.0, 0.01);

    model.addInstances(instances);

    // Use two parallel samplers, which each look at one half the corpus and combine
    //  statistics after every iteration.
    model.setNumThreads(2);

    // Run the model for 50 iterations and stop (this is for testing only, 
    //  for real applications, use 1000 to 2000 iterations)
    model.setNumIterations(50);
    model.estimate();

    // Show the words and topics in the first instance

    // The data alphabet maps word IDs to strings
    Alphabet dataAlphabet = instances.getDataAlphabet();

    FeatureSequence tokens = (FeatureSequence) model.getData().get(0).instance.getData();
    LabelSequence topics = model.getData().get(0).topicSequence;

    Formatter out = new Formatter(new StringBuilder(), Locale.US);
    for (int position = 0; position < tokens.getLength(); position++) {
        out.format("%s-%d ", dataAlphabet.lookupObject(tokens.getIndexAtPosition(position)), topics.getIndexAtPosition(position));
    }
    System.out.println(out);

    // Estimate the topic distribution of the first instance, 
    //  given the current Gibbs state.
    double[] topicDistribution = model.getTopicProbabilities(0);

    // Get an array of sorted sets of word ID/count pairs
    ArrayList<TreeSet<IDSorter>> topicSortedWords = model.getSortedWords();

    // Show top 5 words in topics with proportions for the first document
    for (int topic = 0; topic < numTopics; topic++) {
        Iterator<IDSorter> iterator = topicSortedWords.get(topic).iterator();

        out = new Formatter(new StringBuilder(), Locale.US);
        out.format("%d\t%.3f\t", topic, topicDistribution[topic]);
        int rank = 0;
        while (iterator.hasNext() && rank < 5) {
            IDSorter idCountPair = iterator.next();
            out.format("%s (%.0f) ", dataAlphabet.lookupObject(idCountPair.getID()), idCountPair.getWeight());
            rank++;
        }
        System.out.println(out);
    }

    // Create a new instance with high probability of topic 0
    StringBuilder topicZeroText = new StringBuilder();
    Iterator<IDSorter> iterator = topicSortedWords.get(0).iterator();

    int rank = 0;
    while (iterator.hasNext() && rank < 5) {
        IDSorter idCountPair = iterator.next();
        topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " ");
        rank++;
    }

    // Create a new instance named "test instance" with empty target and source fields.
    InstanceList testing = new InstanceList(instances.getPipe());
    testing.addThruPipe(new Instance(topicZeroText.toString(), null, "test instance", null));

    TopicInferencer inferencer = model.getInferencer();
    double[] testProbabilities = inferencer.getSampledDistribution(testing.get(0), 10, 1, 5);
    System.out.println("0\t" + testProbabilities[0]);
}

}

最佳答案

我解决了这个问题。
首先，我尝试在我的 Eclipse 中导入 trove3.1 但它不起作用。
然后，我注意到在 Mallet 文件夹中，有一个“lib”文件夹，所以我在我的 Eclipse 中包含了所有 jar 文件。答对了!有用。

关于topic-modeling - Mallet主题模型示例无法编译，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25356870/

25

4

0

文章推荐： d3.js - dc.js - 监听图表组渲染

文章推荐： r - 如何同时使用 "for loop"和 "write.csv"？

文章推荐： angularjs - 用于动态电子邮件的 Angular $compile 模板

文章推荐： r - x 轴绘图包含日期

java - 用于构建项目的库？我可以使用 C++ 编译 C++，或者使用 python 编译 java，或者使用 C++ 编译 java，等等？
是否有任何库或框架旨在促进从另一种成熟的编程语言中构建项目？在 C++、java 等编程语言中指定逻辑、集合和复杂规则非常容易，但在 Makefile 中完成这些事情似乎是一场艰苦的战斗。我还没有深
c++ - 代码可以用 clang 编译，但不能用 gcc 编译
我有这段代码可以用 clang 编译得很好(即使使用 -Weverything)，但是 gcc 会发出错误。 #include #include #include using namespace
c++ - C 头文件不能用 C 编译，但可以用 C++ 编译
我有以下 block 头文件 BKE_mesh.h: /* Connectivity data */ typedef struct IndexNode { struct IndexNode *
c++ - 如果使用 Makefile 编译，代码可以正常工作，如果使用 XCode 编译，代码会崩溃
我在我的一个项目中遇到了一个奇怪的问题。我的代码库依赖于一个外部库，其中包含一个名为 Dataset 的类. Dataset类私有(private)继承自 std::vector (其中 Sample
c++ - C/C++ 项目可以使用 Xcode 编译，但不能使用 gcc/g++ 编译
当使用 gcc、g++ 或 make 在终端中编译一个小型 C 或 C++ 项目时，我收到以下错误: /tmp/ccG1caGi.o: In function `main': main.c:(.tex
emacs - 如何在 Windows 上为 Emacs 23.1.50 编译 CEDET 1.0pre7 编译？
我正在尝试从 CVS 为 Windows 上的 Emacs 23.1.50 编译 CEDET，但在“第 6 步:打开 EDE...”时出现错误:“defvar:作为变量的符号值是无效的:cedet-m
c - fflush(stdin) 不能在 cygwin 中用 gcc 编译，但可以用 visual studio 2010 编译
我正在(重新)学习编程，我从 C 开始。我的 IDE(如果我可以这么说)是 Windows7 上的 cygwin(32 位)和 Visual-Studio 2010。我总是编译我用 gcc (cygw
C++ GCC 为什么这段 sfinae 代码可以用 GCC 4.7 编译，但不能用 4.8 编译？
我喜欢在模板类中使用本地类来执行类似“static if”的构造。但是我遇到了 gcc 4.8 不想编译我的代码的问题。但是 4.7 可以。这个例子: #include #include #in
Java - 使用 java 1.4 编译 src/main/java 并使用 1.5 编译 src/test/java
我有一个项目，必须仅使用 java 1.4 进行编译。但我计划使用mockito 编写一些单元测试。我想要一种在 pom 中指定的方法，以便 src/main/java 使用 jdk 1.4 编译，但
PHP 编译
我想了解 PHP 编译过程是如何工作的。假设我有一个名为funcs.php 的文件并且这个文件有三个函数，如果我include 或require 它，所有的在文件加载期间编译三个函数？或者源代码会被
gcc/g++编译
编译工具链我们写程序的时候用的都是集成开发环境 (IDE: Integrated Development Environment)，集成开发环境可以极大地方便我们程序员编写程序，但是配置起来
scala - 编译 for 循环时出现奇怪的错误
当我编写一些 Scala 代码时，在尝试编译代码时收到一条奇怪的错误消息。我将代码分解为一个更简单的代码(从语义的角度来看这完全没有意义，但仍然显示了错误)。 scala> :paste // Ent
带注释的 SASS 编译
我正在编译一个 SCSS 文件，它似乎删除了我的评论。我可以使用什么命令来保留所有评论？ >SASS input.scss output.css 我在 SCSS 中看到两种类型的注释。 // Comm
编译 C 结构
这是我的代码: #include typedef struct { const char *description; float value; int age; } swag
Grails GSP 编译？
当您编译 grails war 时，我知道 .groovy 代码被编译为字节码类文件，但我不明白容器(例如 tomcat)如何在请求 GSP 时知道如何编译它们。容器了解 GSP 吗？安装在服务器上的
编译 : undefined reference to
我正在努力将多个文件编译成一个通用程序。我收到一个错误: undefined reference to 'pi' 这是我的代码和 Makefile 的框架。我做错了什么？谢谢! 文件:calcPi.c
编译 LD_PRELOAD 包装器的冲突类型
我尝试使用 LD_PRELOAD 来 Hook sprintf function ，所以我将打印到缓冲区的结果: #define _GNU_SOURCE #include #include int
javascript - CoffeeScript 编译
我正在寻找最简单的方法来自动将 CoffeeScript 重新编译为 JS。阅读documentation但仍然很难得到我想要的东西。我需要它来监视文件夹 src/ 中的任何 *.coffee 文
javascript - CoffeeScript 编译
我想使用定制waveformjs 。我发现this on SO但是，我不知道如何编译/安装波形来开始。我从 GitHub 克隆它并进行了更改，但是我不知道如何将其转换为 .js 文件。最佳答案为了
java - 编译/捕获异常
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关

首页

博学

6Ren·AI

商城

topic-modeling - Mallet主题模型示例无法编译