java - 使用 java api 在 Neo4j 中插入节点时性能不佳-6ren

java - 使用 java api 在 Neo4j 中插入节点时性能不佳

转载作者：搜寻专家更新时间：2023-10-30 20:09:00

我正在尝试将大约 200 万个节点插入到 Neo4j 中，但遇到了性能问题。

我正在使用带有用 java 编写的服务器扩展的 neo4j enterprise 2.2.0。我的电脑有一个 ssd、32gb 内存、Intel Core i7 cpu 并且正在运行 Windows 8。我运行一个独立版本的服务器并通过运行 bin 文件夹中的 Neo4j.bat 来启动它。

现在插入 10 000 个没有关系的节点大约需要 25 秒(我稍后需要添加关系，但当时有一个问题)。

我认为这是配置问题，所以我尝试了一些设置，但性能没有变化。我觉得奇怪的是，即使我在 neo4j-wrapper.conf 中将 initmemory 和 maxmemory 设置为 15000，java 进程最多也只能分配 3gb。

我在下面附上了我的代码和配置，有人知道我做错了什么吗？插入大图时我应该期待什么样的性能？

插入代码

for (Thing t : things) {
    List<ValuePair> properties = parseThing(t);
    String uid = createUid(t);

    try (Transaction tx = graphDb.beginTx()) {

        Node node = graphDb.createNode();
        node.setProperty("uid", uid);

        for (ValuePair vp : properties) {
            node.setProperty(vp.getName(), vp.getValue());
        }

        tx.success();
    }
}

(起初我在创建节点时添加了一个DynamicLabel，但它更慢。如果你想要在插入节点时获得良好的性能，是否可以使用标签？)

配置

neo4j.properties

################################################################
# Neo4j
#
# neo4j.properties - database tuning parameters
#
################################################################

# Enable this to be able to upgrade a store from an older version.
#allow_store_upgrade=true

# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 75% of RAM minus the max Java heap size.
dbms.pagecache.memory=4g

# Enable this to specify a parser other than the default one.
#cypher_parser_version=2.0

# Keep logical logs, helps debugging but uses more disk space, enabled for
# legacy reasons To limit space needed to store historical logs use values such
# as: "7 days" or "100M size" instead of "true".
#keep_logical_logs=7 days

# Autoindexing

# Enable auto-indexing for nodes, default is false.
#node_auto_indexing=true

# The node property keys to be auto-indexed, if enabled.
#node_keys_indexable=name,age

# Enable auto-indexing for relationships, default is false.
#relationship_auto_indexing=true

# The relationship property keys to be auto-indexed, if enabled.
#relationship_keys_indexable=name,age

# Enable shell server so that remote clients can connect via Neo4j shell.
#remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0 for all interfaces).
#remote_shell_host=127.0.0.1
# The port the shell will listen on, default is 1337.
#remote_shell_port=1337

# The type of cache to use for nodes and relationships.
cache_type=hpc

cache.memory_ratio=70

# Maximum size of the heap memory to dedicate to the cached nodes.
node_cache_size=2g
#relationship_cache_size=6g

# Maximum size of the heap memory to dedicate to the cached relationships.
#relationship_cache_size=

# Enable online backups to be taken from this database.
online_backup_enabled=true

# Port to listen to for incoming backup requests.
online_backup_server=127.0.0.1:6362


# Uncomment and specify these lines for running Neo4j in High Availability mode.
# See the High availability setup tutorial for more details on these settings
# http://neo4j.com/docs/2.2.0/ha-setup-tutorial.html

# ha.server_id is the number of each instance in the HA cluster. It should be
# an integer (e.g. 1), and should be unique for each cluster instance.
#ha.server_id=

# ha.initial_hosts is a comma-separated list (without spaces) of the host:port
# where the ha.cluster_server of all instances will be listening. Typically
# this will be the same for all cluster instances.
#ha.initial_hosts=192.168.0.1:5001,192.168.0.2:5001,192.168.0.3:5001

# IP and port for this instance to listen on, for communicating cluster status
# information iwth other instances (also see ha.initial_hosts). The IP
# must be the configured IP address for one of the local interfaces.
#ha.cluster_server=192.168.0.1:5001

# IP and port for this instance to listen on, for communicating transaction
# data with other instances (also see ha.initial_hosts). The IP
# must be the configured IP address for one of the local interfaces.
#ha.server=192.168.0.1:6001

# The interval at which slaves will pull updates from the master. Comment out
# the option to disable periodic pulling of updates. Unit is seconds.
ha.pull_interval=10

# Amount of slaves the master will try to push a transaction to upon commit
# (default is 1). The master will optimistically continue and not fail the
# transaction even if it fails to reach the push factor. Setting this to 0 will
# increase write performance when writing through master but could potentially
# lead to branched data (or loss of transaction) if the master goes down.
#ha.tx_push_factor=1

# Strategy the master will use when pushing data to slaves (if the push factor
# is greater than 0). There are two options available "fixed" (default) or
# "round_robin". Fixed will start by pushing to slaves ordered by server id
# (highest first) improving performance since the slaves only have to cache up
# one transaction at a time.
#ha.tx_push_strategy=fixed

# Policy for how to handle branched data.
#branched_data_policy=keep_all

# Clustering timeouts
# Default timeout.
#ha.default_timeout=5s

# How often heartbeat messages should be sent. Defaults to ha.default_timeout.
#ha.heartbeat_interval=5s

# Timeout for heartbeats between cluster members. Should be at least twice that of ha.heartbeat_interval.
#heartbeat_timeout=11s

neo4j-server.properties

################################################################
# Neo4j
#
# neo4j-server.properties - runtime operational settings
#
################################################################

#***************************************************************
# Server configuration
#***************************************************************

# location of the database directory
org.neo4j.server.database.location=data/graph.db

# Low-level graph engine tuning file
org.neo4j.server.db.tuning.properties=conf/neo4j.properties

# Database mode
# Allowed values:
# HA - High Availability
# SINGLE - Single mode, default.
# To run in High Availability mode, configure the neo4j.properties config file, then uncomment this line:
#org.neo4j.server.database.mode=HA

# Let the webserver only listen on the specified IP. Default is localhost (only
# accept local connections). Uncomment to allow any connection. Please see the
# security section in the neo4j manual before modifying this.
#org.neo4j.server.webserver.address=0.0.0.0

# Require (or disable the requirement of) auth to access Neo4j
dbms.security.auth_enabled=true

#
# HTTP Connector
#

# http port (for all data, administrative, and UI access)
org.neo4j.server.webserver.port=7474

#
# HTTPS Connector
#

# Turn https-support on/off
org.neo4j.server.webserver.https.enabled=true

# https port (for all data, administrative, and UI access)
org.neo4j.server.webserver.https.port=7473

# Certificate location (auto generated if the file does not exist)
org.neo4j.server.webserver.https.cert.location=conf/ssl/snakeoil.cert

# Private key location (auto generated if the file does not exist)
org.neo4j.server.webserver.https.key.location=conf/ssl/snakeoil.key

# Internally generated keystore (don't try to put your own
# keystore there, it will get deleted when the server starts)
org.neo4j.server.webserver.https.keystore.location=data/keystore

# Comma separated list of JAX-RS packages containing JAX-RS resources, one
# package name for each mountpoint. The listed package names will be loaded
# under the mountpoints specified. Uncomment this line to mount the
# org.neo4j.examples.server.unmanaged.HelloWorldResource.java from
# neo4j-server-examples under /examples/unmanaged, resulting in a final URL of
# http://localhost:7474/examples/unmanaged/helloworld/{nodeId}
#org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged

org.neo4j.server.thirdparty_jaxrs_classes=my.project.package=/mypath

#*****************************************************************
# HTTP logging configuration
#*****************************************************************

# HTTP logging is disabled. HTTP logging can be enabled by setting this
# property to 'true'.
org.neo4j.server.http.log.enabled=false

# Logging policy file that governs how HTTP log output is presented and
# archived. Note: changing the rollover and retention policy is sensible, but
# changing the output format is less so, since it is configured to use the
# ubiquitous common log format
org.neo4j.server.http.log.config=conf/neo4j-http-logging.xml

#*****************************************************************
# Administration client configuration
#*****************************************************************

# location of the servers round-robin database directory. possible values:
# - absolute path like /var/rrd
# - path relative to the server working directory like data/rrd
# - commented out, will default to the database data directory.
org.neo4j.server.webadmin.rrdb.location=data/rrd

neo4j-wrapper.conf

#********************************************************************
# Property file references
#********************************************************************

wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties

#********************************************************************
# JVM Parameters
#********************************************************************

wrapper.java.additional.1=-XX:+UseConcMarkSweepGC
wrapper.java.additional.2=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional.3=-XX:-OmitStackTraceInFastThrow
wrapper.java.additional.4=-XX:hashCode=5

# Remote JMX monitoring, uncomment and adjust the following lines as needed.
# Also make sure to update the jmx.access and jmx.password files with appropriate permission roles and passwords,
# the shipped configuration contains only a read only role called 'monitor' with password 'Neo4j'.
# For more details, see: http://download.oracle.com/javase/7/docs/technotes/guides/management/agent.html
# On Unix based systems the jmx.password file needs to be owned by the user that will run the server,
# and have permissions set to 0600.
# For details on setting these file permissions on Windows see:
#     http://docs.oracle.com/javase/7/docs/technotes/guides/management/security-windows.html
#wrapper.java.additional=-Dcom.sun.management.jmxremote.port=3637
#wrapper.java.additional=-Dcom.sun.management.jmxremote.authenticate=true
#wrapper.java.additional=-Dcom.sun.management.jmxremote.ssl=false
#wrapper.java.additional=-Dcom.sun.management.jmxremote.password.file=conf/jmx.password
#wrapper.java.additional=-Dcom.sun.management.jmxremote.access.file=conf/jmx.access

# Some systems cannot discover host name automatically, and need this line configured:
#wrapper.java.additional=-Djava.rmi.server.hostname=$THE_NEO4J_SERVER_HOSTNAME

# Uncomment the following lines to enable garbage collection logging
#wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
#wrapper.java.additional=-XX:+PrintGCDetails
#wrapper.java.additional=-XX:+PrintGCDateStamps
#wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
#wrapper.java.additional=-XX:+PrintPromotionFailure
#wrapper.java.additional=-XX:+PrintTenuringDistribution

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=15000
wrapper.java.maxmemory=15000

#********************************************************************
# Wrapper settings
#********************************************************************
# path is relative to the bin dir
wrapper.pidfile=../data/neo4j-server.pid

#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
#  using this configuration file has been installed as a service.
#  Please uninstall the service before modifying this section.  The
#  service can then be reinstalled.

# Name of the service
wrapper.name=neo4j

# User account to be used for linux installs. Will default to current
# user if not set.
wrapper.user=

#********************************************************************
# Other Neo4j system properties
#********************************************************************
wrapper.java.additional=-Dneo4j.ext.udc.source=zip

wrapper.java.additional=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Xdebug-Xnoagent-Djava.compiler=NONE-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

如果你能帮我解决这个问题，我会很高兴的!

最佳答案

您需要在交易中创建多个节点，否则交易开销会消耗大部分时间。

请这样尝试:

try (Transaction tx = graphDb.beginTx()) {

    for (Thing t : things) {

        List<ValuePair> properties = parseThing(t);
        String uid = createUid(t);

        Node node = graphDb.createNode();
        node.setProperty("uid", uid);

        for (ValuePair vp : properties) {
            node.setProperty(vp.getName(), vp.getValue());
        }
    }

    tx.success();
}

关于java - 使用 java api 在 Neo4j 中插入节点时性能不佳，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31335756/

文章推荐： objective-c - 从 UITableView 获取 UITableViewController

文章推荐： objective-c - dispatch_sync(dispatch_get_main_queue() UI 怪异

文章推荐： objective-c - Xcode 6 在编译时自动添加不受支持的权利

文章推荐： objective-c - 忽略了在 ios 8 上工作的 iOs 7 的约束

javascript - 为什么 i^=j^=i^=j 不等于 *i^=*j^=*i^=*j
在C语言中，当有变量(假设都是int)i小于j时，我们可以用等式 i^=j^=i^=j 交换两个变量的值。例如，令int i = 3，j = 5；在计算 i^=j^=i^=j 之后，我有 i = 5，
c - 查找满足 i < j 且 A[i]**A[j] > A[j]**A[i] 的对 (A[i], A[j]) 的数量
我为以下问题编写了以下代码: 给定一个由 N 个正整数组成的序列 A，编写一个程序来查找满足 i > A[j]A[i](A[i] 的 A[j] 次方 > A[j] 的 A[i] 次方)。我的代码通过
java - 表达式j+=j-=j*j和j+=j*=j-=j的结果和解析结果是什么？ (多个等于)
这个表达式是从左到右解析的吗？我试图解释解析的结果，但最后的结果是错误的。 int j=10, k=10; j+=j-=j*=j; //j=j+(j-=j*=j)=j+(j-j*j) k+=k*=
c++ - 给定索引 i,j(j>=i) 如何找到子数组 (i,j) 中 A[j] 的频率？
给定一个整数数组 A ，我试图找出在给定位置 j ，A[j] 从每个 i=0 到 i=j 在 A 中出现了多少次。我设计了一个如下所示的解决方案 map CF[400005]; for(int i=0
arrays - 最大化 A[i]*B[i] + A[i]*B[j] + A[j]*B[j], i != j，给定两个正整数的无序列表
你能帮我算法吗: 给定 2 个相同大小的数组 a[]和 b[]具有大于或等于 1 的整数。查找不相等的索引 i和 j ( i != j ) 使得值 -max(a[i]*b[i] + a[i] * b
j - J 中的内存
每次用J的M.副词，性能显着下降。因为我怀疑艾弗森和许比我聪明得多，我一定是做错了什么。考虑 Collatz conjecture .这里似乎有各种各样的内存机会，但不管我放在哪里M. ，性能太差了
j - J 中的链式动词
假设一个包含各种类型的盒装矩阵: matrix =: ('abc';'defgh';23),:('foo';'bar';45) matrix +---+-----+--+|abc|defgh|23|+
c - 是否有可能对于两个正整数 i 和 j，(-i)/j 不等于 -(i/j)？
是否有可能对于两个正整数 i 和 j，(-i)/j 不等于 -(i/j)？我不知道这是否可能......我认为这将是关于位的东西，或者 char 类型的溢出或其他东西，但我找不到它。有什么想法吗？最
j - J 中不等数组的唯一对
假设两个不同大小的数组: N0 =: i. 50 N1 =: i. 500 应该有一种方法可以获得唯一的对，只需将两者结合起来即可。我发现的“最简单”是: ]$R =: |:,"2 |: (,.N0)
j - J 中是否实现了三次样条插值方法？
我是 J 的新用户，我只是想知道 J 包中是否实现了三次样条插值方法？最佳答案我自己不熟悉，但是我确实安装了所有的包，所以 $ rg -l -i spline /usr/share/j/9.02
j - J 中的每个前置副词
在 Q/kdb 中，您可以使用 ': 轻松修改动词，它代表每个优先级。它会将动词应用于一个元素及其之前的邻居。例如 =': 检查值对是否相等。在 J 中，您可以轻松折叠 /\ 但它是累积的，是否有成对
matlab - 如何在 MATLAB 中将矩阵变为 1+j、1-j、-1+j、-1-j
嗨，我有一个 4x4 双矩阵 A 1+2i 2-1i -3-2i -1+4i 3-1i -3+2i 1-3i -1-3i 4+3i 3+5i 1-2i -1-4i
j - J 中的欧拉恒等式
刚刚发现 J 语言，我输入: 1+^o.*0j1 I expected the answer to be 0 ，但我得到了 0j1.22465e_16。虽然这非常接近于 0，但我想知道为什么 J 应该
c++ - 为什么对于每个数组 a 和整数 j，a[j] 都等于 j[a]？
这个问题在这里已经有了答案: With arrays, why is it the case that a[5] == 5[a]? (20 个答案) 关闭 3 年前。我正在阅读“C++ 编程语言”
algorithm - 当 A[i,j]=j*(A[i-1,j+1]-A[i-1,j]) 时，找到第 i 行第一个元素的最有效方法是什么？
当第一行是 1, 1/2 , 1/3 ....这是支持该问题的图像。是否存在比朴素的 O(n^2) 方法更有效的方法？我在研究伯努利数时遇到了这个问题，然后在研究“Akiyama-Tanigawa
java - 为什么 (i<=j && j<=i && i!=j) 评估为 TRUE？
我写了一段Java代码，它在无限循环中运行。下面是代码: public class TestProgram { public static void main(String[] args){
big-o - 嵌套循环的大O (int j = 0; j < i; j++)
for (int i = n; i > 0; i /= 2) { for (int j = 0; j 0; i /= 2) 的第一个循环结果 O(log N) . 第二个循环for (int
arrays - 找出数组中满足 ia[j] 的 (i,j) 对的总数
如问题中所述，需要找到数组中 (i,j) 对的总数，使得 (1) **ia[j]** 其中 i 和 j 是数组的索引。没有空间限制。我的问题是 1) Is there any approach w
python 当 s 在范围(i,j-1) : j=3 but before range it was j=2 . ..请帮助我时，我的 j 值发生变化
for l in range(1,len(S)-1): for i in range(1,len(S)-l): j=i+l for X in N:
time-complexity - 这个 for 循环的复杂度是多少，for (int j = i; j < n; j++)？
第二个for循环的复杂度是多少？会是n-i吗？根据我的理解，第一个 for 循环将执行 n 次，但第二个 for 循环中的索引设置为 i。 //where n is the number elemen

搜寻专家

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 使用 java api 在 Neo4j 中插入节点时性能不佳

插入代码

配置