gpt4 book ai didi

etl - 如何使用 OrientDB ETL 仅创建边

转载 作者:行者123 更新时间:2023-12-04 14:25:50 25 4
gpt4 key购买 nike

我有两个 CSV 文件:

首先包含以下格式的 ~ 500M 记录

id,name
10000023432,Tom User
13943423235,Blah Person



第二个包含 ~ 1.5B 以下格式的 friend 关系

fromId,toId
10000023432,13943423235



我使用 OrientDB ETL 工具从第一个 CSV 文件创建顶点。现在,我只需要创建边缘以在它们之间建立友谊连接。

到目前为止,我已经尝试了 ETL json 文件的多种配置,最新的是这个:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}

但不幸的是,除了创建边之外,这还会创建具有“from”和“to”属性的顶点。

当我尝试移除顶点更改器(mutator)时,ETL 过程会引发错误:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more

我在这里缺少什么?

最佳答案

您可以使用这些 ETL 转换器导入边:

"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

“合并”转换器将加入具有相关人员记录的当前 csv 行(这有点奇怪,但出于某种原因,必须将 fromId 与源人员相关联)。

“字段”转换器将删除合并部分添加的 csv 字段。您也可以尝试在没有“场”变压器的情况下导入以查看差异。

关于etl - 如何使用 OrientDB ETL 仅创建边,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33679571/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com