gpt4 book ai didi

java - 使用 java api 在 Neo4j 中插入节点时性能不佳

转载 作者:搜寻专家 更新时间:2023-10-30 20:09:00 24 4
gpt4 key购买 nike

我正在尝试将大约 200 万个节点插入到 Neo4j 中,但遇到了性能问题。

我正在使用带有用 java 编写的服务器扩展的 neo4j enterprise 2.2.0。我的电脑有一个 ssd、32gb 内存、Intel Core i7 cpu 并且正在运行 Windows 8。我运行一个独立版本的服务器并通过运行 bin 文件夹中的 Neo4j.bat 来启动它。

现在插入 10 000 个没有关系的节点大约需要 25 秒(我稍后需要添加关系,但当时有一个问题)。

我认为这是配置问题,所以我尝试了一些设置,但性能没有变化。我觉得奇怪的是,即使我在 neo4j-wrapper.conf 中将 initmemory 和 maxmemory 设置为 15000,java 进程最多也只能分配 3gb。

我在下面附上了我的代码和配置,有人知道我做错了什么吗?插入大图时我应该期待什么样的性能?

插入代码

for (Thing t : things) {
List<ValuePair> properties = parseThing(t);
String uid = createUid(t);

try (Transaction tx = graphDb.beginTx()) {

Node node = graphDb.createNode();
node.setProperty("uid", uid);

for (ValuePair vp : properties) {
node.setProperty(vp.getName(), vp.getValue());
}

tx.success();
}
}

(起初我在创建节点时添加了一个DynamicLabel,但它更慢。如果你想要在插入节点时获得良好的性能,是否可以使用标签?)

配置

neo4j.properties

################################################################
# Neo4j
#
# neo4j.properties - database tuning parameters
#
################################################################

# Enable this to be able to upgrade a store from an older version.
#allow_store_upgrade=true

# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 75% of RAM minus the max Java heap size.
dbms.pagecache.memory=4g

# Enable this to specify a parser other than the default one.
#cypher_parser_version=2.0

# Keep logical logs, helps debugging but uses more disk space, enabled for
# legacy reasons To limit space needed to store historical logs use values such
# as: "7 days" or "100M size" instead of "true".
#keep_logical_logs=7 days

# Autoindexing

# Enable auto-indexing for nodes, default is false.
#node_auto_indexing=true

# The node property keys to be auto-indexed, if enabled.
#node_keys_indexable=name,age

# Enable auto-indexing for relationships, default is false.
#relationship_auto_indexing=true

# The relationship property keys to be auto-indexed, if enabled.
#relationship_keys_indexable=name,age

# Enable shell server so that remote clients can connect via Neo4j shell.
#remote_shell_enabled=true
# The network interface IP the shell will listen on (use 0.0.0 for all interfaces).
#remote_shell_host=127.0.0.1
# The port the shell will listen on, default is 1337.
#remote_shell_port=1337

# The type of cache to use for nodes and relationships.
cache_type=hpc

cache.memory_ratio=70

# Maximum size of the heap memory to dedicate to the cached nodes.
node_cache_size=2g
#relationship_cache_size=6g

# Maximum size of the heap memory to dedicate to the cached relationships.
#relationship_cache_size=

# Enable online backups to be taken from this database.
online_backup_enabled=true

# Port to listen to for incoming backup requests.
online_backup_server=127.0.0.1:6362


# Uncomment and specify these lines for running Neo4j in High Availability mode.
# See the High availability setup tutorial for more details on these settings
# http://neo4j.com/docs/2.2.0/ha-setup-tutorial.html

# ha.server_id is the number of each instance in the HA cluster. It should be
# an integer (e.g. 1), and should be unique for each cluster instance.
#ha.server_id=

# ha.initial_hosts is a comma-separated list (without spaces) of the host:port
# where the ha.cluster_server of all instances will be listening. Typically
# this will be the same for all cluster instances.
#ha.initial_hosts=192.168.0.1:5001,192.168.0.2:5001,192.168.0.3:5001

# IP and port for this instance to listen on, for communicating cluster status
# information iwth other instances (also see ha.initial_hosts). The IP
# must be the configured IP address for one of the local interfaces.
#ha.cluster_server=192.168.0.1:5001

# IP and port for this instance to listen on, for communicating transaction
# data with other instances (also see ha.initial_hosts). The IP
# must be the configured IP address for one of the local interfaces.
#ha.server=192.168.0.1:6001

# The interval at which slaves will pull updates from the master. Comment out
# the option to disable periodic pulling of updates. Unit is seconds.
ha.pull_interval=10

# Amount of slaves the master will try to push a transaction to upon commit
# (default is 1). The master will optimistically continue and not fail the
# transaction even if it fails to reach the push factor. Setting this to 0 will
# increase write performance when writing through master but could potentially
# lead to branched data (or loss of transaction) if the master goes down.
#ha.tx_push_factor=1

# Strategy the master will use when pushing data to slaves (if the push factor
# is greater than 0). There are two options available "fixed" (default) or
# "round_robin". Fixed will start by pushing to slaves ordered by server id
# (highest first) improving performance since the slaves only have to cache up
# one transaction at a time.
#ha.tx_push_strategy=fixed

# Policy for how to handle branched data.
#branched_data_policy=keep_all

# Clustering timeouts
# Default timeout.
#ha.default_timeout=5s

# How often heartbeat messages should be sent. Defaults to ha.default_timeout.
#ha.heartbeat_interval=5s

# Timeout for heartbeats between cluster members. Should be at least twice that of ha.heartbeat_interval.
#heartbeat_timeout=11s

neo4j-server.properties

################################################################
# Neo4j
#
# neo4j-server.properties - runtime operational settings
#
################################################################

#***************************************************************
# Server configuration
#***************************************************************

# location of the database directory
org.neo4j.server.database.location=data/graph.db

# Low-level graph engine tuning file
org.neo4j.server.db.tuning.properties=conf/neo4j.properties

# Database mode
# Allowed values:
# HA - High Availability
# SINGLE - Single mode, default.
# To run in High Availability mode, configure the neo4j.properties config file, then uncomment this line:
#org.neo4j.server.database.mode=HA

# Let the webserver only listen on the specified IP. Default is localhost (only
# accept local connections). Uncomment to allow any connection. Please see the
# security section in the neo4j manual before modifying this.
#org.neo4j.server.webserver.address=0.0.0.0

# Require (or disable the requirement of) auth to access Neo4j
dbms.security.auth_enabled=true

#
# HTTP Connector
#

# http port (for all data, administrative, and UI access)
org.neo4j.server.webserver.port=7474

#
# HTTPS Connector
#

# Turn https-support on/off
org.neo4j.server.webserver.https.enabled=true

# https port (for all data, administrative, and UI access)
org.neo4j.server.webserver.https.port=7473

# Certificate location (auto generated if the file does not exist)
org.neo4j.server.webserver.https.cert.location=conf/ssl/snakeoil.cert

# Private key location (auto generated if the file does not exist)
org.neo4j.server.webserver.https.key.location=conf/ssl/snakeoil.key

# Internally generated keystore (don't try to put your own
# keystore there, it will get deleted when the server starts)
org.neo4j.server.webserver.https.keystore.location=data/keystore

# Comma separated list of JAX-RS packages containing JAX-RS resources, one
# package name for each mountpoint. The listed package names will be loaded
# under the mountpoints specified. Uncomment this line to mount the
# org.neo4j.examples.server.unmanaged.HelloWorldResource.java from
# neo4j-server-examples under /examples/unmanaged, resulting in a final URL of
# http://localhost:7474/examples/unmanaged/helloworld/{nodeId}
#org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged

org.neo4j.server.thirdparty_jaxrs_classes=my.project.package=/mypath

#*****************************************************************
# HTTP logging configuration
#*****************************************************************

# HTTP logging is disabled. HTTP logging can be enabled by setting this
# property to 'true'.
org.neo4j.server.http.log.enabled=false

# Logging policy file that governs how HTTP log output is presented and
# archived. Note: changing the rollover and retention policy is sensible, but
# changing the output format is less so, since it is configured to use the
# ubiquitous common log format
org.neo4j.server.http.log.config=conf/neo4j-http-logging.xml

#*****************************************************************
# Administration client configuration
#*****************************************************************

# location of the servers round-robin database directory. possible values:
# - absolute path like /var/rrd
# - path relative to the server working directory like data/rrd
# - commented out, will default to the database data directory.
org.neo4j.server.webadmin.rrdb.location=data/rrd

neo4j-wrapper.conf

#********************************************************************
# Property file references
#********************************************************************

wrapper.java.additional=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional=-Dlog4j.configuration=file:conf/log4j.properties

#********************************************************************
# JVM Parameters
#********************************************************************

wrapper.java.additional.1=-XX:+UseConcMarkSweepGC
wrapper.java.additional.2=-XX:+CMSClassUnloadingEnabled
wrapper.java.additional.3=-XX:-OmitStackTraceInFastThrow
wrapper.java.additional.4=-XX:hashCode=5

# Remote JMX monitoring, uncomment and adjust the following lines as needed.
# Also make sure to update the jmx.access and jmx.password files with appropriate permission roles and passwords,
# the shipped configuration contains only a read only role called 'monitor' with password 'Neo4j'.
# For more details, see: http://download.oracle.com/javase/7/docs/technotes/guides/management/agent.html
# On Unix based systems the jmx.password file needs to be owned by the user that will run the server,
# and have permissions set to 0600.
# For details on setting these file permissions on Windows see:
# http://docs.oracle.com/javase/7/docs/technotes/guides/management/security-windows.html
#wrapper.java.additional=-Dcom.sun.management.jmxremote.port=3637
#wrapper.java.additional=-Dcom.sun.management.jmxremote.authenticate=true
#wrapper.java.additional=-Dcom.sun.management.jmxremote.ssl=false
#wrapper.java.additional=-Dcom.sun.management.jmxremote.password.file=conf/jmx.password
#wrapper.java.additional=-Dcom.sun.management.jmxremote.access.file=conf/jmx.access

# Some systems cannot discover host name automatically, and need this line configured:
#wrapper.java.additional=-Djava.rmi.server.hostname=$THE_NEO4J_SERVER_HOSTNAME

# Uncomment the following lines to enable garbage collection logging
#wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
#wrapper.java.additional=-XX:+PrintGCDetails
#wrapper.java.additional=-XX:+PrintGCDateStamps
#wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime
#wrapper.java.additional=-XX:+PrintPromotionFailure
#wrapper.java.additional=-XX:+PrintTenuringDistribution

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
wrapper.java.initmemory=15000
wrapper.java.maxmemory=15000

#********************************************************************
# Wrapper settings
#********************************************************************
# path is relative to the bin dir
wrapper.pidfile=../data/neo4j-server.pid

#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
# using this configuration file has been installed as a service.
# Please uninstall the service before modifying this section. The
# service can then be reinstalled.

# Name of the service
wrapper.name=neo4j

# User account to be used for linux installs. Will default to current
# user if not set.
wrapper.user=

#********************************************************************
# Other Neo4j system properties
#********************************************************************
wrapper.java.additional=-Dneo4j.ext.udc.source=zip

wrapper.java.additional=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Xdebug-Xnoagent-Djava.compiler=NONE-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

如果你能帮我解决这个问题,我会很高兴的!

最佳答案

您需要在交易中创建多个节点,否则交易开销会消耗大部分时间。

请这样尝试:

try (Transaction tx = graphDb.beginTx()) {

for (Thing t : things) {

List<ValuePair> properties = parseThing(t);
String uid = createUid(t);

Node node = graphDb.createNode();
node.setProperty("uid", uid);

for (ValuePair vp : properties) {
node.setProperty(vp.getName(), vp.getValue());
}
}

tx.success();
}

关于java - 使用 java api 在 Neo4j 中插入节点时性能不佳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31335756/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com