- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
基于对此 question 的回答我在想我已经为我的 .pb 文件提供了一个“错误的解码器”。
This is the data I'm trying to decode .
基于ListPeople.java Java tutorial documentation 中提供的示例,我试着写一些类似的东西来开始挑选数据,我写了这个:
import cc.refectorie.proj.relation.protobuf.DocumentProtos.Document;
import cc.refectorie.proj.relation.protobuf.DocumentProtos.Document.Sentence;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.PrintStream;
public class ListDocument
{
// Iterates though all people in the AddressBook and prints info about them.
static void Print(Document document)
{
for ( Sentence sentence: document.getSentencesList() )
{
for(int i=0; i < sentence.getTokensCount(); i++)
{
System.out.println(" getTokens(" + i + ": " + sentence.getTokens(i) );
}
}
}
// Main function: Reads the entire address book from a file and prints all
// the information inside.
public static void main(String[] args) throws Exception {
if (args.length != 1) {
System.err.println("Usage: ListPeople ADDRESS_BOOK_FILE");
System.exit(-1);
}
// Read the existing address book.
Document addressBook =
Document.parseFrom(new FileInputStream(args[0]));
Print(addressBook);
}
}
但是当我运行它时,我得到了这个错误
Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
at com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
at com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:174)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:194)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:210)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:215)
at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
at cc.refectorie.proj.relation.protobuf.DocumentProtos$Document.parseFrom(DocumentProtos.java:4770)
at ListDocument.main(ListDocument.java:40)
所以,正如我上面所说,我认为这与我没有正确定义解码器有关。有没有什么方法可以查看我尝试使用的 .proto 文件并想出一种方法来读取所有这些数据?
有没有什么方法可以查看那个 .proto 文件,看看我做错了什么?
这些是我要阅读的文件的前几行:
Ü
&/guid/9202a8c04000641f8000000003221072&/guid/9202a8c04000641f80000000004cfd50NA"Ö
S/m/vinci8/data1/riedel/projects/relation/kb/nyt1/docstore/2007-joint/1850511.xml.pb„€€€øÿÿÿÿƒ€€€øÿÿÿÿ"PERSON->PERSON"'inverse_false|PERSON|on bass and|PERSON"/inverse_false|with|PERSON|on bass and|PERSON|on"7inverse_false|, with|PERSON|on bass and|PERSON|on drums"$inverse_false|PERSON|IN NN CC|PERSON",inverse_false|with|PERSON|IN NN CC|PERSON|on"4inverse_false|, with|PERSON|IN NN CC|PERSON|on drums"`str:Dave[NMOD]->|PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"]str:Dave[NMOD]->|PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Rstr:Dave[NMOD]->|PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON"Adep:[NMOD]->|PERSON|[PMOD]->[ADV]->[ROOT]<-[PRD]<-[PMOD]<-|PERSON"dir:->|PERSON|->-><-<-<-|PERSON"Sstr:PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"Adep:PERSON|[PMOD]->[ADV]->[ROOT]<-[PRD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Pstr:PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Adep:PERSON|[PMOD]->[ADV]->[ROOT]<-[PRD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Estr:PERSON|[PMOD]->with[ADV]->was[ROOT]<-on[PRD]<-bass[PMOD]<-|PERSON*ŒThe occasion was suitably exceptional : a reunion of the 1970s-era Sam Rivers Trio , with Dave Holland on bass and Barry Altschul on drums ."¬
S/m/vinci8/data1/riedel/projects/relation/kb/nyt1/docstore/2007-joint/1849689.xml.pb†€€€øÿÿÿÿ…€€€øÿÿÿÿ"PERSON->PERSON"'inverse_false|PERSON|on bass and|PERSON"/inverse_false|with|PERSON|on bass and|PERSON|on"7inverse_false|, with|PERSON|on bass and|PERSON|on drums"$inverse_false|PERSON|IN NN CC|PERSON",inverse_false|with|PERSON|IN NN CC|PERSON|on"4inverse_false|, with|PERSON|IN NN CC|PERSON|on drums"cstr:Dave[NMOD]->|PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"`str:Dave[NMOD]->|PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Ustr:Dave[NMOD]->|PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON"Cdep:[NMOD]->|PERSON|[PMOD]->[NMOD]->[NULL]<-[NMOD]<-[PMOD]<-|PERSON"dir:->|PERSON|->-><-<-<-|PERSON"Vstr:PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->Barry"Cdep:PERSON|[PMOD]->[NMOD]->[NULL]<-[NMOD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Sstr:PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON|[NMOD]->on"Cdep:PERSON|[PMOD]->[NMOD]->[NULL]<-[NMOD]<-[PMOD]<-|PERSON|[NMOD]->"dir:PERSON|->-><-<-<-|PERSON|->"Hstr:PERSON|[PMOD]->with[NMOD]->Trio[NULL]<-on[NMOD]<-bass[PMOD]<-|PERSON*ÊTonight he brings his energies and expertise to the Miller Theater for the festival 's thrilling finale : a reunion of the 1970s Sam Rivers Trio , with Dave Holland on bass and Barry Altschul on drums .â
&/guid/9202a8c04000641f80000000004cfd50&/guid/9202a8c04000641f8000000003221072NA"Ù
编辑
这是另一个研究人员用来解析这些文件的文件,有人告诉我,我可以使用它吗?
package edu.stanford.nlp.kbp.slotfilling.multir;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.zip.GZIPInputStream;
import edu.stanford.nlp.kbp.slotfilling.classify.MultiLabelDataset;
import edu.stanford.nlp.kbp.slotfilling.common.Log;
import edu.stanford.nlp.kbp.slotfilling.multir.DocumentProtos.Relation;
import edu.stanford.nlp.stats.ClassicCounter;
import edu.stanford.nlp.stats.Counter;
import edu.stanford.nlp.util.ErasureUtils;
import edu.stanford.nlp.util.HashIndex;
import edu.stanford.nlp.util.Index;
/**
* Converts Hoffmann's data in protobuf format to our MultiLabelDataset
* @author Mihai
*
*/
public class ProtobufToMultiLabelDataset {
static class RelationAndMentions {
String arg1;
String arg2;
Set<String> posLabels;
Set<String> negLabels;
List<Mention> mentions;
public RelationAndMentions(String types, String a1, String a2) {
arg1 = a1;
arg2 = a2;
String [] rels = types.split(",");
posLabels = new HashSet<String>();
for(String r: rels){
if(! r.equals("NA")) posLabels.add(r.trim());
}
negLabels = new HashSet<String>(); // will be populated later
mentions = new ArrayList<Mention>();
}
};
static class Mention {
List<String> features;
public Mention(List<String> feats) {
features = feats;
}
}
public static void main(String[] args) throws Exception {
String input = args[0];
InputStream is = new GZIPInputStream(
new BufferedInputStream
(new FileInputStream(input)));
toMultiLabelDataset(is);
is.close();
}
public static MultiLabelDataset<String, String> toMultiLabelDataset(InputStream is) throws IOException {
List<RelationAndMentions> relations = toRelations(is, true);
MultiLabelDataset<String, String> dataset = toDataset(relations);
return dataset;
}
public static void toDatums(InputStream is,
List<List<Collection<String>>> relationFeatures,
List<Set<String>> labels) throws IOException {
List<RelationAndMentions> relations = toRelations(is, false);
toDatums(relations, relationFeatures, labels);
}
private static void toDatums(List<RelationAndMentions> relations,
List<List<Collection<String>>> relationFeatures,
List<Set<String>> labels) {
for(RelationAndMentions rel: relations) {
labels.add(rel.posLabels);
List<Collection<String>> mentionFeatures = new ArrayList<Collection<String>>();
for(int i = 0; i < rel.mentions.size(); i ++){
mentionFeatures.add(rel.mentions.get(i).features);
}
relationFeatures.add(mentionFeatures);
}
assert(labels.size() == relationFeatures.size());
}
public static List<RelationAndMentions> toRelations(InputStream is, boolean generateNegativeLabels) throws IOException {
//
// Parse the protobuf
//
// all relations are stored here
List<RelationAndMentions> relations = new ArrayList<RelationAndMentions>();
// all known relations (without NIL)
Set<String> relTypes = new HashSet<String>();
Map<String, Map<String, Set<String>>> knownRelationsPerEntity =
new HashMap<String, Map<String,Set<String>>>();
Counter<Integer> labelCountHisto = new ClassicCounter<Integer>();
Relation r = null;
while ((r = Relation.parseDelimitedFrom(is)) != null) {
RelationAndMentions relation = new RelationAndMentions(
r.getRelType(), r.getSourceGuid(), r.getDestGuid());
labelCountHisto.incrementCount(relation.posLabels.size());
relTypes.addAll(relation.posLabels);
relations.add(relation);
for(int i = 0; i < r.getMentionCount(); i ++) {
DocumentProtos.Relation.RelationMentionRef mention = r.getMention(i);
// String s = mention.getSentence();
relation.mentions.add(new Mention(mention.getFeatureList()));
}
for(String l: relation.posLabels) {
addKnownRelation(relation.arg1, relation.arg2, l, knownRelationsPerEntity);
}
}
Log.severe("Loaded " + relations.size() + " relations.");
Log.severe("Found " + relTypes.size() + " relation types: " + relTypes);
Log.severe("Label count histogram: " + labelCountHisto);
Counter<Integer> slotCountHisto = new ClassicCounter<Integer>();
for(String e: knownRelationsPerEntity.keySet()) {
slotCountHisto.incrementCount(knownRelationsPerEntity.get(e).size());
}
Log.severe("Slot count histogram: " + slotCountHisto);
int negativesWithKnownPositivesCount = 0, totalNegatives = 0;
for(RelationAndMentions rel: relations) {
if(rel.posLabels.size() == 0) {
if(knownRelationsPerEntity.get(rel.arg1) != null &&
knownRelationsPerEntity.get(rel.arg1).size() > 0) {
negativesWithKnownPositivesCount ++;
}
totalNegatives ++;
}
}
Log.severe("Found " + negativesWithKnownPositivesCount + "/" + totalNegatives +
" negative examples with at least one known relation for arg1.");
Counter<Integer> mentionCountHisto = new ClassicCounter<Integer>();
for(RelationAndMentions rel: relations) {
mentionCountHisto.incrementCount(rel.mentions.size());
if(rel.mentions.size() > 100)
Log.fine("Large relation: " + rel.mentions.size() + "\t" + rel.posLabels);
}
Log.severe("Mention count histogram: " + mentionCountHisto);
//
// Detect the known negatives for each source entity
//
if(generateNegativeLabels) {
for(RelationAndMentions rel: relations) {
Set<String> negatives = new HashSet<String>(relTypes);
negatives.removeAll(rel.posLabels);
rel.negLabels = negatives;
}
}
return relations;
}
private static MultiLabelDataset<String, String> toDataset(List<RelationAndMentions> relations) {
int [][][] data = new int[relations.size()][][];
Index<String> featureIndex = new HashIndex<String>();
Index<String> labelIndex = new HashIndex<String>();
Set<Integer> [] posLabels = ErasureUtils.<Set<Integer> []>uncheckedCast(new Set[relations.size()]);
Set<Integer> [] negLabels = ErasureUtils.<Set<Integer> []>uncheckedCast(new Set[relations.size()]);
int offset = 0, posCount = 0;
for(RelationAndMentions rel: relations) {
Set<Integer> pos = new HashSet<Integer>();
Set<Integer> neg = new HashSet<Integer>();
for(String l: rel.posLabels) {
pos.add(labelIndex.indexOf(l, true));
}
for(String l: rel.negLabels) {
neg.add(labelIndex.indexOf(l, true));
}
posLabels[offset] = pos;
negLabels[offset] = neg;
int [][] group = new int[rel.mentions.size()][];
for(int i = 0; i < rel.mentions.size(); i ++){
List<String> sfeats = rel.mentions.get(i).features;
int [] features = new int[sfeats.size()];
for(int j = 0; j < sfeats.size(); j ++) {
features[j] = featureIndex.indexOf(sfeats.get(j), true);
}
group[i] = features;
}
data[offset] = group;
posCount += posLabels[offset].size();
offset ++;
}
Log.severe("Creating a dataset with " + data.length + " datums, out of which " + posCount + " are positive.");
MultiLabelDataset<String, String> dataset = new MultiLabelDataset<String, String>(
data, featureIndex, labelIndex, posLabels, negLabels);
return dataset;
}
private static void addKnownRelation(String arg1, String arg2, String label,
Map<String, Map<String, Set<String>>> knownRelationsPerEntity) {
Map<String, Set<String>> myRels = knownRelationsPerEntity.get(arg1);
if(myRels == null) {
myRels = new HashMap<String, Set<String>>();
knownRelationsPerEntity.put(arg1, myRels);
}
Set<String> mySlots = myRels.get(label);
if(mySlots == null) {
mySlots = new HashSet<String>();
myRels.put(label, mySlots);
}
mySlots.add(arg2);
}
}
最佳答案
已更新;这里的混淆是两点:
Relation
,而不是Document
(事实上,甚至只使用了Relation
和RelationMentionRef
)因此,Relation.parseDelimitedFrom
应该可以工作。 Processing it manually ,我得到:
test-multiple.pb, 96678 Relation objects parsed
testNegative.pb, 94917 Relation objects parsed
testPositive.pb, 1950 Relation objects parsed
trainNegative.pb, 63596 Relation objects parsed
trainPositive.pb, 4700 Relation objects parsed
旧;过时的;探索性的:
我提取了您的 4 个文档并通过一个小测试装置运行了它们:
ProcessFile("testNegative.pb");
ProcessFile("testPositive.pb");
ProcessFile("trainNegative.pb");
ProcessFile("trainPositive.pb");
其中 ProcessFile
首先将前 10 个字节转储为十六进制,然后尝试通过 ProtoReader
对其进行处理。这是结果:
Processing: testNegative.pb
dc 16 0a 26 2f 67 75 69 64 2f
> Document
Unexpected end-group in source data; this usually means the source data is corru
pt
是的;同意; DC为线型4(端组),场27;您的文档没有定义字段 27,即使它定义了:从端组开始是没有意义的。
Processing: testPositive.pb
d5 0f 0a 26 2f 67 75 69 64 2f
> Document
250: Fixed32, Unexpected field
14: Fixed32, Unexpected field
6: String, Unexpected field
6: Variant, Unexpected field
Unexpected end-group in source data; this usually means the source data is corru
pt
我们在十六进制转储中看不到有问题的数据,但同样:初始字段看起来与您的数据完全不同,读者很容易确认数据已损坏。
Processing: trainNegative.pb
d1 09 0a 26 2f 67 75 69 64 2f
> Document
154: Fixed64, Unexpected field
7: Fixed64, Unexpected field
6: Variant, Unexpected field
6: Variant, Unexpected field
Unexpected end-group in source data; this usually means the source data is corru
pt
同上。
Processing: trainPositive.pb
cf 75 0a 26 2f 67 75 69 64 2f
> Document
1881: 7, Unexpected field
Invalid wire-type; this usually means you have over-written a file without trunc
ating or setting the length; see http://stackoverflow.com/q/2152978/23354
CF 75 是一个双字节 varint,类型为 7(规范中未定义)。
您的数据确实是垃圾。对不起。
还有评论中的 test-multiple.pb 奖励回合(gz 解压后):
Processing: test-multiple.pb
dc 16 0a 26 2f 67 75 69 64 2f
> Document
Unexpected end-group in source data; this usually means the source data is corru
pt
这与 testNegative.pb 相同,因此失败的原因完全相同。
关于java - 如何根据给定的 .proto 编写有效的解码文件,从 .pb 读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29531899/
我有以下 json: {"results": [{"columns":["room_id","player_name","player_ip"], "types":["integer","text
我在 go 中获取格式不一致的 JSON 文件。例如,我可以有以下内容: {"email": "\"blah.blah@blah.com\""} {"email": "robert@gmail.com
JavaScript中有JSON编码/解码base64编码/解码函数吗? 最佳答案 是的,btoa() 和 atob() 在某些浏览器中可以工作: var enc = btoa("this is so
我在其中一个项目中使用了 Encog,但在解码 One-Of Class 时卡住了。该字段的规范化操作之一是 NormalizationAction.OneOf,它具有三个输出。当我评估时,我想解码预
在我的 previous question关于使用 serialize() 创建对象的 CSV 我从 jmoy 那里得到了一个很好的答案,他推荐了我的序列化文本的 base64 编码。这正是我要找的。
有些事情让我感到困惑 - 为什么 this image在每个浏览器中显示不同? IE9(和 Windows 照片查看器)中的图像: Firefox(和 Photoshop)中的图像: Chrome(和
是否可以在不知道它的类型( JAXBContext.newInstance(clazz) )的情况下解码一个类,或者什么是测试即将到来的正确方法? 我确实收到了从纯文本中解码的消息 - 字符串 传入的
我正在尝试使用 openSSL 库进行 Base64 解码,然后使用 CMS 来验证签名。 下面的代码总是将缓冲区打印为 NULL。 char signed_data[] = "MIIO"; int
我有一个带有 SEL 类型实例变量的类,它是对选择器的引用。在encodeWithCoder/initWithCoder中,如何编码/解码这种类型的变量? 最佳答案 您可以使用 NSStringFro
var url = 'http://www.googleapis.com/customsearch/v1?q=foo&searchType=image'; window.fetch(url) .t
我想知道Android 2.2、2.3和3,4支持的音频/视频格式列表。我也想知道哪些Android版本支持视频编码和解码。我经历了this link,但是关于编码和解码我并不清楚。 任何人的回答都是
我在其中一个项目中使用 Encog,但在解码 One-Of 类时遇到了困难。该字段的规范化操作之一是 NormalizationAction.OneOf,它具有三个输出。当我评估时,我想解码预测值。如
我正在尝试解码现有的 xml 文件,以便我可以正确处理数据,但 XML 结构看起来很奇怪。下面是 xml 示例以及我创建的对象。 11 266 AA1001 1
对 unicode 字符进行 URL 编码的常用方法是将其拆分为 2 %HH 代码。 (\u4161 => %41%61) 但是,unicode在解码时是如何区分的呢?您如何知道 %41%61 是 \
我正在尝试将 json 字符串解码为 Map。 我知道有很多这样的问题,但我需要非常具体的格式。例如,我有 json 字符串: { "map": { "a": "b",
我有一个查询,我认为需要像这样(解码会更大) SELECT firstName, lastName, decode(mathMrk, 80, 'A', mathMrk) as decodeMat
我知道PHP函数encode()和decode(),它们对我来说工作得很好,但我想在url中传递编码字符串,但encode确实返回特殊字符,如“=”、“”' “等等...... 这显然会破坏我的脚本,
我必须解码 Basic bW9uTG9naW46bW9uTW90RGVQYXNz 形式的 http 请求的授权 header 当我解码它时online ,我得到了正确的结果 monLogin:monM
这个问题已经有答案了: Decode Base64 data in Java (21 个回答) 已关闭 8 年前。 我想知道使用哪个库进行 Base64 编码/解码?我需要此功能足够稳定以供生产使用。
我正在尝试从 Arduino BT 解码 []byte,我的连接完美,问题是当我尝试解码数组时。我得到的只是这个字符�(发送的字节数相同)我认为问题出在解码上。我尝试使用 ASCII 字符集,但仍然存
我是一名优秀的程序员,十分优秀!