neo4j - Cypher 加载 CSV 急切且 Action 持续时间长-6ren

neo4j - Cypher 加载 CSV 急切且 Action 持续时间长

转载作者：行者123 更新时间：2023-12-04 20:38:00

26

4

我正在加载一个 85K 行的文件 - 19M，
服务器有 2 个内核，14GB RAM，运行 centos 7.1 和 oracle JDK 8
使用以下服务器配置可能需要 5-10 分钟 :

dbms.pagecache.memory=8g                  
cypher_parser_version=2.0  
wrapper.java.initmemory=4096  
wrapper.java.maxmemory=4096

挂载在/etc/fstab 中的磁盘:

UUID=fc21456b-afab-4ff0-9ead-fdb31c14151a /mnt/neodata            
ext4    defaults,noatime,barrier=0      1  2

将此添加到/etc/security/limits.conf:

*                soft      memlock         unlimited
*                hard      memlock         unlimited
*                soft      nofile          40000
*                hard      nofile          40000

将此添加到/etc/pam.d/su

session         required        pam_limits.so

将此添加到/etc/sysctl.conf:

vm.dirty_background_ratio = 50
vm.dirty_ratio = 80

通过运行禁用日志:

 sudo e2fsck /dev/sdc1
 sudo tune2fs /dev/sdc1
 sudo tune2fs -o journal_data_writeback /dev/sdc1
 sudo tune2fs -O ^has_journal /dev/sdc1
 sudo e2fsck -f /dev/sdc1
 sudo dumpe2fs /dev/sdc1

除此之外，
运行分析器时，我得到了很多“渴望”，我真的不明白为什么:

 PROFILE LOAD CSV WITH HEADERS FROM 'file:///home/csv10.csv' AS line
 FIELDTERMINATOR '|'
 WITH line limit 0
 MERGE (session :Session { wz_session:line.wz_session })
 MERGE (page :Page { page_key:line.domain+line.page }) 
   ON CREATE SET page.name=line.page, page.domain=line.domain, 
 page.protocol=line.protocol,page.file=line.file


Compiler CYPHER 2.3

Planner RULE

Runtime INTERPRETED

+---------------+------+---------+---------------------+--------------------------------------------------------+
| Operator      | Rows | DB Hits | Identifiers         | Other                                                  |
+---------------+------+---------+---------------------+--------------------------------------------------------+
| +EmptyResult  |    0 |       0 |                     |                                                        |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph  |    9 |       9 | line, page, session | MergeNode; Add(line.domain,line.page); :Page(page_key) |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Eager        |    9 |       0 | line, session       |                                                        |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph  |    9 |       9 | line, session       | MergeNode; line.wz_session; :Session(wz_session)       |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +ColumnFilter |    9 |       0 | line                | keep columns line                                      |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Filter       |    9 |       0 | anon[181], line     | anon[181]                                              |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +Extract      |    9 |       0 | anon[181], line     | anon[181]                                              |
| |             +------+---------+---------------------+--------------------------------------------------------+
| +LoadCSV      |    9 |       0 | line                |                                                        |
+---------------+------+---------+---------------------+--------------------------------------------------------+

所有标签和属性都有索引/约束
谢谢您的帮助
利奥尔

最佳答案

贺利欧，

我们试图在这里解释 Eager Loading:

Marks 的原始博客文章在这里:http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/

Rik 试图用更简单的术语来解释它:

http://blog.bruggen.com/2015/07/loading-belgian-corporate-registry-into_20.html

试图理解“Eager Operation”

我之前读过这个，但直到安德烈斯再次向我解释后才真正理解它:在所有正常操作中，Cypher 延迟加载数据。例如，请参阅手册中的此页面 - 它基本上只是在执行操作时尽可能少地加载到内存中。这种懒惰通常是一件非常好的事情。但它也会给你带来很多麻烦——正如迈克尔向我解释的那样:

"Cypher tries to honor the contract that the different operations within a statement are not affecting each other. Otherwise you might up with non-deterministic behavior or endless loops. Imagine a statement like this:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});

If the two statements would not be isolated, then each node the CREATE generates would cause the MATCH to match again etc. an endless loop. That's why in such cases, Cypher eagerly runs all MATCH statements to exhaustion so that all the intermediate results are accumulated and kept (in memory).

Usually with most operations that's not an issue as we mostly match only a few hundred thousand elements max.

With data imports using LOAD CSV, however, this operation will pull in ALL the rows of the CSV (which might be millions), execute all operations eagerly (which might be millions of creates/merges/matches) and also keeps the intermediate results in memory to feed the next operations in line.

This also disables PERIODIC COMMIT effectively because when we get to the end of the statement execution all create operations will already have happened and the gigantic tx-state has accumulated."

这就是我的负载 csv 查询的情况。 MATCH/MERGE/CREATE 导致将一个急切的管道添加到执行计划中，并且它有效地禁用了我的操作批处理“使用定期提交”。显然，即使使用看似简单的 LOAD CSV 语句，也有不少用户遇到了这个问题。很多时候你可以避免它，但有时你不能。”

关于neo4j - Cypher 加载 CSV 急切且 Action 持续时间长，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31788513/

26

4

0

文章推荐： angularjs - 使用 angularjs 的地理编码服务

文章推荐： csv - 在 Spark 中高效聚合多个 CSV

文章推荐： vba - 宏卡住单词

javascript - EmberJS Action - 当包装在 `actions` 中时从另一个 Action 调用一个 Action
当包裹在 EmberJS Controller 的 actions 中时，如何从另一个 Action 调用一个 Action ？使用现已弃用的方式定义操作的原始代码: //app.js App.In
Github Action -完成一个 Action 后触发另一个 Action
我有一个 Action (一个yaml文件)，用于将docker镜像部署到Google Cloud Run。我希望收到通知构建和推送结果的Slack或电子邮件。构建操作完成后，如何触发消息操作？
java - Action 类中的tick(Action action)是什么？
Selenium 的 actions 类中存在的 tick(Action action) 和 tick(Interaction...actions) 方法的用途是什么？是否与点击任何 webElem
actions-on-google - 对话 Action 2023 年日落 : Migrating from conversational actions to Smart Home Actions
简短的背景故事我们目前为数百名用户提供对话操作。我们在过去三年中为我们的一位客户开发了这个 Action 作为“工作”。正如我们最近发现的那样，我们会受到对话行为的影响。当然，我们现在正在研究如何
uml - 在事件图中，由于一个 Action 包含在另一个 Action 中，是否可以 fork 成两个 Action 但在加入时只有一个 Action ？
考虑系统用户可以并发方式执行两个操作，第一个操作 (A1) 仅对用户的订单执行，第二个操作 (A2) 包括在执行时执行 (A1)，如下面的使用所述-案例图..((考虑A1完全执行U1，A2完全执行U2
android - Action 项目系统地堆叠在 Action 溢出中，在 Action 栏中
我正在为 android 中的 ActionBar 而苦苦挣扎。这是我的问题:我的操作项没有显示在操作栏中，而是堆叠在操作溢出中，无论我做什么.. 我花了一天的时间寻找解决方案，但我似乎找不到缺少的
github-actions - 如何将 Action 的输出用作 Github Action 工作流程的 if 条件中的表达式？
我正在构建一个工作流，其中一个操作为工作流中的一个步骤提供条件。我该如何使用这个值？该操作的值为空，因此计算结果为 false，并且从未部署过任何内容... jobs: build: s
redux - 像显示/隐藏加载屏幕这样的 Action 应该由相关 Action 的reducer处理还是由 Action 创建者自己生成？
鉴于您有一些全局 View (例如，显示加载屏幕)，您可能希望在许多情况下发生这种情况，为该行为创建一个 Action 创建者/ Action 对还是为相关 Action 创建 reducer 更合适
actions-on-google - Actions on Google 启动自定义操作(不是主要的 actions.intent.MAIN)
我有一个使用 DialogFlow 构建的 Actions on Google 代理，其中包含多个操作(例如 actions.intent.MAIN 和 get_day_of_week)。当我在 3
github-actions - 如何从 GitHub Action 的 action.yml 文件中引用其他操作？
是否可以从我的 action.yml 文件中引用另一个 GitHub 操作？请注意，我在这里谈论的是操作，而不是工作流程。我知道这可以通过工作流来完成，但是操作可以引用其他操作吗？最佳答案答案似
javascript - 如何从一个 Action 派发另一个 Action 并在 Vuex 中派发另一个 Action
在 Vuex 操作中，我们有以下实现。 async actionA({ commit, dispatch }) { const data = this.$axios.$get(`/apiUrl`)
java - 正在调用 struts.xml 中定义的 Action ，但未调用 Action 包中存在的 Action
我正在将我的应用程序服务器从 Jboss 4.2 迁移到 7.1。我在 Struts 配置中收到以下错误。 struts.xml 中定义的 Action 被调用，而 Action 包中的操作未被调用。
java - 将 Action 重定向(使用拦截器)到其他 Action 时无法执行 Struts2 Action
我向 ActLand 发送请求，然后 intercept()，如果没有登录则重定向到 Login.jsp。 struts.xml:
javascript - Action 创建者是否有必要返回 Action ？
我有一个 Action 创建器，它接受一个 id 和一个回调函数。它向服务器发送请求以执行某些操作并返回一个虚拟操作。我在这里想做的就是调用回调函数并退出，因为该虚拟操作对我来说没有用处，例如喜欢帖子
c# - Action 链接到子 Action
我已经使用 Html.Action 方法调用了另一个 View 。当用户单击操作链接时，我想在 subview 内使用参数调用相同的操作。当我写这段代码时，我得到了这个错误信息: Html.Acti
c# - Action<> 与事件 Action
是 public event Action delt = () => { Console.WriteLine("Information"); }; 的重载版本 Action delg = (a, b)
java从另一个 Action 调用 Action
countresultsfrom.addActionListener(new ActionListener() { public void actionPerforme
c# - Action 是什么意思？
我刚刚看到一个 brand-new video在 Rx 框架上，一个特别的签名引起了我的注意: Scheduler.schedule(this IScheduler, Action) 在 23:55，
actions-on-google - Google Action 和 DialogFlow 错误 "Sorry, this action is not available for your app"
我创建了一个在我的开发者帐户中完美运行的 DialogFlow 应用程序。但我需要以另一个用户的身份对其进行测试，因此在我的 Google Action 模拟器中，我添加了另一个测试帐户作为项目的所
java - 如何在 Action 链调用上的另一个 Action 类之后访问 Jsp 中的一个 Action 类 ActionMessages
我正在尝试实现消息存储拦截器以在我的 JSp 上显示 ActionMessage，但无法访问 ActionMessage。有人可以提供一个链接如何实现消息存储拦截器吗？最佳答案这是我的一个应用程序

首页

博学

6Ren·AI

商城

neo4j - Cypher 加载 CSV 急切且 Action 持续时间长