gpt4 book ai didi

java - 从键值(环境)传递到关系(实体)时,Xodus 会生成一个巨大的文件

转载 作者:行者123 更新时间:2023-12-02 08:59:32 26 4
gpt4 key购买 nike

我最初使用 Xodus Entity 创建了一个键值数据库,该数据库创建了一个小型 2GB 数据库:

public static void main(String[] args) throws Exception{

if (args.length != 2){
throw new Exception("Argument missing. Current number of arguments: " + args.length);
}

long offset = Long.parseLong(args[0]);
long chunksize = Long.parseLong(args[1]);

Path pathBabelNet = Paths.get("/mypath/BabelNet-API-3.7/config");
BabelNetLexicalizationDataSource dataSource = new BabelNetLexicalizationDataSource(pathBabelNet);
Map<String, List<String>> data = new HashMap<String, List<String>>();
data = dataSource.getDataChunk(offset, chunksize);

jetbrains.exodus.env.Environment env = Environments.newInstance(".myAppData");
final Transaction txn = env.beginTransaction();
Store store = env.openStore("xodus-lexicalizations", StoreConfig.WITHOUT_DUPLICATES, txn);

for (Map.Entry<String, List<String>> entry : data.entrySet()) {
String key = entry.getKey();
String value = entry.getValue().get(0);

store.put(txn, StringBinding.stringToEntry(key), StringBinding.stringToEntry(value));
}

txn.commit();
env.close();

}

我使用批处理脚本分块执行此操作:

#!/bin/bash

START_TIME=$SECONDS

chunksize=50000

for ((offset=0; offset<165622128;))
do
echo $offset;
java -Xmx10g -jar /path/to/jar.jar $offset $chunksize
offset=$((offset+(chunksize*12)))
done

ELAPSED_TIME=$(($SECONDS - $START_TIME))

echo $ELAPSED_TIME;

现在我更改了它,使其具有相关性:

public static void main(String[] args) throws Exception{

if (args.length != 2){
throw new Exception("Argument missing. Current number of arguments: " + args.length);
}

long offset = Long.parseLong(args[0]);
long chunksize = Long.parseLong(args[1]);

Path pathBabelNet = Paths.get("/mypath/BabelNet-API-3.7/config");
BabelNetLexicalizationDataSource dataSource = new BabelNetLexicalizationDataSource(pathBabelNet);
Map<String, List<String>> data = new HashMap<String, List<String>>();
data = dataSource.getDataChunk(offset, chunksize);

PersistentEntityStore store = PersistentEntityStores.newInstance("lexicalizations-test");
final StoreTransaction txn = store.beginTransaction();

Entity synsetID;
Entity lexicalization;
String id;

for (Map.Entry<String, List<String>> entry : data.entrySet()) {
String key = entry.getKey();
String value = entry.getValue().get(0);

synsetID = txn.newEntity("SynsetID");
synsetID.setProperty("synsetID", key);

lexicalization = txn.newEntity("Lexicalization");
lexicalization.setProperty("lexicalization", value);

lexicalization.addLink("synsetID", synsetID);
synsetID.addLink("lexicalization", lexicalization);

txn.flush();
}

txn.commit();
}

这创建了一个超过 17GB 的文件,但它只是因为内存不足而停止。我知道它会更大,因为它必须存储链接等,但大十倍?我做错了什么?

最佳答案

出于某种原因,删除 txn.flush() 可以解决所有问题。现在只有 5.5GB。

关于java - 从键值(环境)传递到关系(实体)时,Xodus 会生成一个巨大的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60263966/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com