gpt4 book ai didi

java - HashMap ,删除重复项,但也存储原始出现的数量。 java

转载 作者:行者123 更新时间:2023-12-01 11:58:50 24 4
gpt4 key购买 nike

我有一个java HashMap ,用于生成通过推理学习的“规则”。例如,输入可能如下所示:

'prevents'('scurvy','vitamin C').
'contains'('vitamin C','orange').
'contains'('vitamin C','sauerkraut').
'isa'('fruit','orange').
'improves'('health','fruit').

输出可能如下所示:

prevents(scurvy, orange).
prevents(scurvy, sauerkraut).
improves(health, orange).

对于一个小测试,一切都运行得很好,但在我的实际数据集中,我有许多相同规则的实例。我想以某种方式存储每个规则的出现次数并将其写入文件,以及它被看到的次数,因为我认为这可能是某种幼稚的置信度度量,用于衡量该规则是一个好的规则的可能性。

此时我存储这样的句子:

public class Sentence {
private String verb;
private String object;
private String subject;
public Sentence(String verb, String object, String subject ){
this.verb = verb;
this.object = object;
this.subject = subject;
}
public String getVerb(){ return verb; }
public String getObject(){ return object; }
public String getSubject(){ return subject; }
public String toString(){
return verb + "(" + object + ", " + subject + ")";
}
}

HashMap 构造:

public class Ontology {
private List<Sentence> sentences = new ArrayList<>();
/*
* The following maps store the relation of a string occurring
* as a subject or object, respectively, to the list of Sentence
* ordinals where they occur.
*/
private Map<String,List<Integer>> subject2index = new HashMap<>();
private Map<String,List<Integer>> object2index = new HashMap<>();
/*
* This set contains strings that occur as both,
* subject and object. This is useful for determining strings
* acting as an in-between connecting two relations.
*/
private Set<String> joints = new HashSet<>();
public void addSentence( Sentence s ){
// add Sentence to the list of all Sentences
sentences.add( s );
// add the Subject of the Sentence to the map mapping strings
// occurring as a subject to the ordinal of this Sentence
List<Integer> subind = subject2index.get( s.getSubject() );
if( subind == null ){
subind = new ArrayList<>();
subject2index.put( s.getSubject(), subind );
}
subind.add( sentences.size() - 1 );
// add the Object of the Sentence to the map mapping strings
// occurring as an object to the ordinal of this Sentence
List<Integer> objind = object2index.get( s.getObject() );
if( objind == null ){
objind = new ArrayList<>();
object2index.put( s.getObject(), objind );
}
objind.add( sentences.size() - 1 );
// determine whether we've found a "joining" string
if( subject2index.containsKey( s.getObject() ) ){
joints.add( s.getObject() );
}
if( object2index.containsKey( s.getSubject() ) ){
joints.add( s.getSubject() );
}
}
public Collection<String> getJoints(){
return joints;
}
public List<Integer> getSubjectIndices( String subject ){
return subject2index.get( subject );
}
public List<Integer> getObjectIndices( String object ){
return object2index.get( object );
}
public Sentence getSentence( int index ){
return sentences.get( index );
}
}

最后是确定规则的代码:

public static void main(String[] args) throws IOException {
Ontology ontology = new Ontology();
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
Pattern p = Pattern.compile("'(.*?)'\\('(.*?)','(.*?)'\\)");
String line;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if( m.matches() ) {
String verb = m.group(1);
String object = m.group(2);
String subject = m.group(3);
ontology.addSentence( new Sentence( verb, object, subject ) );
}
}

for( String joint: ontology.getJoints() ){
for( Integer subind: ontology.getSubjectIndices( joint ) ){
Sentence xaS = ontology.getSentence( subind );
for( Integer obind: ontology.getObjectIndices( joint ) ){
Sentence yOb = ontology.getSentence( obind );
Sentence s = new Sentence( xaS.getVerb(),
xaS.getObject(),
yOb.getSubject() );
System.out.println( s );
}
}
}
}

是否有某种快速有效的方法来消除此 HashMap 中的重复项,为每个唯一规则仅保留一个实例,并同时将新索引与我们在原始规则中观察到的该规则的相同实例的数量相关联 map ?

我想在处理句子后消除重复的“规则”。但只有在我有机会计算每个规则发生的频率并将其保存为与我最终保留的唯一规则关联的值之后。

最佳答案

我建议对您的数据模型进行一些更改。您可以非常轻松地在 Map 中存储句子出现的次数,如下所示:

Map<Sentence, Integer> sentenceCount = new HashMap<>();

这依赖于 SentenceequalshashCode 方法的实现。它使用 Sentence 作为键自动消除重复项。

您可以按如下方式添加新句子:

public addSentence(Sentence sentence) {
if (!sentenceCount.containsKey(sentence))
sentenceCount.put(sentence, 0);
sentenceCount.put(sentence, sentenceCount.get(sentence) + 1);
}

现在您不再需要 sentences 列表,因为您可以使用 sentenceCount.keySet() 获取句子集。

如果您需要从主语和宾语到句子的映射,那么我不建议您使用索引:这是一种容易出错的方法。相反,我建议你让它们直接映射:

Map<String, Set<Sentence>> subjectMap;
Map<String, Set<Sentence>> objectMap;

您可以使用它来查找某个主题出现的次数:

subjectMap.get("subject").stream().mapToInt(sentenceCount::get).sum();

关于java - HashMap ,删除重复项,但也存储原始出现的数量。 java ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28133881/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com