gpt4 book ai didi

java - mongo java 驱动程序比 shell 慢(15 倍)

转载 作者:行者123 更新时间:2023-12-02 00:59:27 25 4
gpt4 key购买 nike

尝试仅转储 _id 列。使用 mongo shell,大约 2 分钟即可完成:

time mongoexport -h localhost -d db1 -c collec1 -f _id -o u.text --csv
connected to: localhost
exported 68675826 records
real 2m20.970s

使用java大约需要30分钟:java -cp mongo-test- assembly-0.1.jar com.poshmark.Test

class Test {
public static void main(String[] args) {


MongoClient mongoClient = new MongoClient("localhost");
MongoDatabase database = mongoClient.getDatabase("db1");
MongoCollection<Document> collection = database.getCollection("collec1");

MongoCursor<Document> iterator = collection.find().projection(new Document("_id", 1)).iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().toString());
}
}

}

盒子上的 CPU 使用率很低,没有看到任何网络延迟问题,因为两个测试都在同一个盒子上运行

更新:

使用 Files.newBufferedWriter 而不是 System.out.println 但最终获得相同的性能。

查看 db.currentOp(),让我认为 mongo 正在访问磁盘,因为它有太多 numYields

{
"inprog" : [
{
"desc" : "conn8636699",
"threadId" : "0x79a70c0",
"connectionId" : 8636699,
"opid" : 1625079940,
"active" : true,
"secs_running" : 12,
"microsecs_running" : NumberLong(12008522),
"op" : "getmore",
"ns" : "users.users",
"query" : {
"_id" : {
"$exists" : true
}
},
"client" : "10.1.166.219:60324",
"numYields" : 10848,
"locks" : {

},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(21696)
},
"acquireWaitCount" : {
"r" : NumberLong(26)
},
"timeAcquiringMicros" : {
"r" : NumberLong(28783)
}
},
"MMAPV1Journal" : {
"acquireCount" : {
"r" : NumberLong(10848)
},
"acquireWaitCount" : {
"r" : NumberLong(5)
},
"timeAcquiringMicros" : {
"r" : NumberLong(40870)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(10848)
}
},
"Collection" : {
"acquireCount" : {
"R" : NumberLong(10848)
}
}
}
}
]
}

最佳答案

问题出在 STDOUT 中。

Printing to stdout is not inherently slow. It is the terminal you work with that is slow.

https://stackoverflow.com/a/3860319/3710490

The disk appears to be faster, because it is highly buffered.

The terminal, on the other hand, does little or no buffering: each individual print / write(line) waits for the full write (i.e. display to output device) to complete.

https://stackoverflow.com/a/3857543/3710490

我已经用足够的相似数据集重现了您的用例。

mongoexportFILE

$ time "C:\Program Files\MongoDB\Server\4.2\bin\mongoexport.exe" -h localhost -d test -c collec1 -f _id -o u.text --csv
2020-03-28T13:03:01.550+0100 csv flag is deprecated; please use --type=csv instead
2020-03-28T13:03:02.433+0100 connected to: mongodb://localhost/
2020-03-28T13:03:03.479+0100 [........................] test.collec1 0/21028330 (0.0%)
2020-03-28T13:05:02.934+0100 [########################] test.collec1 21028330/21028330 (100.0%)
2020-03-28T13:05:02.934+0100 exported 21028330 records

real 2m1,936s
user 0m0,000s
sys 0m0,000s

mongoexport 到 STDOUT

$ time "C:\Program Files\MongoDB\Server\4.2\bin\mongoexport.exe" -h localhost -d test -c collec1 -f _id --csv
2020-03-28T14:43:16.479+0100 connected to: mongodb://localhost/
2020-03-28T14:43:16.545+0100 [........................] test.collec1 0/21028330 (0.0%)
2020-03-28T14:53:02.361+0100 [########################] test.collec1 21028330/21028330 (100.0%)
2020-03-28T14:53:02.361+0100 exported 21028330 records

real 9m45,962s
user 0m0,015s
sys 0m0,000s

JAVA 到文件

$ time "C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" -jar mongo-test-assembly-0.1.jar FILE
Wasted time for [FILE] - 271,57 sec

real 4m32,174s
user 0m0,015s
sys 0m0,000s

JAVA 到 STDOUT 到文件

$ time "C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" -jar mongo-test-assembly-0.1.jar SYSOUT > u.text

real 6m50,962s
user 0m0,015s
sys 0m0,000s

JAVA 到 STDOUT

$ time "C:\Program Files\Java\jdk1.8.0_211\bin\java.exe" -jar mongo-test-assembly-0.1.jar SYSOUT > u.text

Wasted time for [SYSOUT] - 709,33 sec

real 11m51,276s
user 0m0,000s
sys 0m0,015s

Java代码

long init = System.currentTimeMillis();
try (MongoClient mongoClient = new MongoClient("localhost");
BufferedWriter writer = Files.newBufferedWriter(Files.createTempFile("benchmarking", ".tmp"))) {

MongoDatabase database = mongoClient.getDatabase("test");
MongoCollection<Document> collection = database.getCollection("collec1");

MongoCursor<Document> iterator = collection.find().projection(new Document("_id", 1)).iterator();
while (iterator.hasNext()) {
if ("SYSOUT".equals(args[0])) {
System.out.println(iterator.next().get("_id"));

} else {
writer.write(iterator.next().get("_id") + "\n");
}
}
} catch (Exception e) {
e.printStackTrace();
}
long end = System.currentTimeMillis();
System.out.println(String.format("Wasted time for [%s] - %.2f sec", args[0], (end - init) / 1_000.));

关于java - mongo java 驱动程序比 shell 慢(15 倍),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60896988/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com