gpt4 book ai didi

c++ - GIZA++ 输出缺少 *.ti.final 和 *actual.ti.final 文件

转载 作者:塔克拉玛干 更新时间:2023-11-03 07:16:11 27 4
gpt4 key购买 nike

我在理解如何运行 GIZA++ 的基础知识时遇到问题。

我通过 StackOverflow ( Is there a tutorial about giza++?) 上的讨论以及人们在那里提供的链接进行了讨论。我已经从 Moses-SMT Github 下载并编译了最新的 giza。

git clone https://github.com/moses-smt/giza-pp.git
cd giza-pp
make

编译成功后我写了一个简单的脚本用于测试。

#!/bin/bash
SRC=french
TRG=english
PREFIX=out
GIZA=../giza-pp

# Cleaning from previous run ...
rm -f *.log
rm -f *.vcb
rm -f *.snt
rm -f *.vcb.classes
rm -f *.vcb.classes.cats
rm -f *.gizacfg
rm -f *.cooc
rm -f ${PREFIX}*

# Converting plain text into sentence format using the "plain2snt.out" tool ...
${GIZA}/GIZA++-v2/plain2snt.out ${SRC} ${TRG}

# Generating word clusters using the "mkcls" tool ...
${GIZA}/mkcls-v2/mkcls -p${SRC} -V${SRC}.vcb.classes
${GIZA}/mkcls-v2/mkcls -p${TRG} -V${TRG}.vcb.classes

# Generating coocurrence using the "snt2cooc" tool ...
${GIZA}/GIZA++-v2/snt2cooc.out ${SRC}.vcb ${TRG}.vcb ${SRC}_${TRG}.snt > ${SRC}_${TRG}.cooc

# Running "GIZA++" ...
${GIZA}/GIZA++-v2/GIZA++ -S ${SRC}.vcb -T ${TRG}.vcb -C ${SRC}_${TRG}.snt -CoocurrenceFile ${SRC}_${TRG}.cooc -o ${PREFIX} >> giza.log 2>&1

现在这是我运行脚本后目录的内容。

jakub@jakub-virtual-machine:~/Master/giza-pp_test$ ls
english french_english.snt out.d3.final out.perp
english_french.snt french.vcb out.d4.final out.t3.final
english.vcb french.vcb.classes out.D4.final out.trn.src.vcb
english.vcb.classes french.vcb.classes.cats out.Decoder.config out.trn.trg.vcb
english.vcb.classes.cats giza.log out.gizacfg out.tst.src.vcb
french out.a3.final out.n3.final out.tst.trg.vcb
french_english.cooc out.A3.final out.p0_3.final run_test.sh

重点是输出缺少下面列出的(对我来说很重要的)文件。

out.ti.final
out.actual.ti.final

现在我一直在寻找 GIZA 的 Main.cpp(行:260 - 273)并且可以看到应该创建这些文件的行。

cerr << "writing Final tables to Disk \n";
string t_inv_file = Prefix + ".ti.final" ;
if( !FEWDUMPS)
m1.getTTable().printProbTableInverse(t_inv_file.c_str(), m1.getEnglishVocabList(),
m1.getFrenchVocabList(),
m1.getETotalWCount(),
m1.getFTotalWCount());
t_inv_file = Prefix + ".actual.ti.final" ;
if( !FEWDUMPS )
m1.getTTable().printProbTableInverse(t_inv_file.c_str(),
eTrainVcbList.getVocabList(),
fTrainVcbList.getVocabList(),
m1.getETotalWCount(),
m1.getFTotalWCount(), true);

我还在日志中打印了“cerr”行,但我无法找出为什么这些文件不在输出中。

jakub@jakub-virtual-machine:~/Master/giza-pp_test$ cat giza.log | tail -n 25
p0_count is 4.0073 and p1 is 5.99635; p0 is 0.400584 p1: 0.599416
Model4: TRAIN CROSS-ENTROPY 0.80096 PERPLEXITY 1.74226
Model4: (10) TRAIN VITERBI CROSS-ENTROPY 0.801289 PERPLEXITY 1.74266
Dumping alignment table (a) to file:out.a3.final
Dumping distortion table (d) to file:out.d3.final
Dumping nTable to: out.n3.final

Model4 Viterbi Iteration : 10 took: 0 seconds
H3333344444 Training Finished at: Fri Oct 23 16:24:44 2015


Entire Viterbi H3333344444 Training took: 0 seconds
==========================================================
writing Final tables to Disk
Writing PERPLEXITY report to: out.perp
Writing source vocabulary list to : out.trn.src.vcb
Writing source vocabulary list to : out.trn.trg.vcb
Writing source vocabulary list to : out.tst.src.vcb
Writing source vocabulary list to : out.tst.trg.vcb
writing decoder configuration file to out.Decoder.config

Entire Training took: 0 seconds
Program Finished at: Fri Oct 23 16:24:44 2015

==========================================================

有人遇到过类似的问题吗?这是某种错误还是我做错了什么?

编辑:

现在,我已经在 MakefileCFLAGS 中不使用 -DBINARY_SEARCH_FOR_TTABLE 选项重新编译了整个 GIZA++。并更改了脚本,使其不会生成并向 GIZA++ 提供同现文件。在我重新运行脚本后,输出确实包含 out.actual.ti.finalout.ti.final。有人知道如何解释这种行为吗?我教过我会使用共现文件获得更好的对齐和概率估计,请问有什么需要吗?还是只是为了提高性能的速度?

最佳答案

我之前遇到过同样的问题。我认为缺少的步骤是 在位于 .\giza-pp\GIZA++-v2\的 Makefile 中,替换以下行:CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DBINARY_SEARCH_FOR_TTABLE -DWORDINDEX_WITH_4_BYTE

行:CFLAGS_OPT = $(CFLAGS) -O3 -funroll-loops -DNDEBUG -DWORDINDEX_WITH_4_BYTE -DWORDINDEX_WITH_4_BYTE

检查这个看跌期权,祝你好运

关于c++ - GIZA++ 输出缺少 *.ti.final 和 *actual.ti.final 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33305090/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com