gpt4 book ai didi

java - 为什么 String.strip() 比 String.trim() 在 Java 11 中的空白字符串快 5 倍

转载 作者:搜寻专家 更新时间:2023-10-30 21:03:41 26 4
gpt4 key购买 nike

我遇到了一个有趣的场景。出于某种原因,strip() 处理空白字符串(仅包含空格)比 Java 11 中的 trim() 快得多。

基准

public class Test {

public static final String TEST_STRING = " "; // 3 whitespaces

@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testTrim() {
TEST_STRING.trim();
}

@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testStrip() {
TEST_STRING.strip();
}

public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}

结果

# Run complete. Total time: 00:04:16

Benchmark Mode Cnt Score Error Units
Test.testStrip thrpt 200 2067457963.295 ± 12353310.918 ops/s
Test.testTrim thrpt 200 402307182.894 ± 4559641.554 ops/s

显然 strip() 优于 trim() ~5 倍。

虽然对于非空字符串,结果几乎相同:

public class Test {

public static final String TEST_STRING = " Test String ";

@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testTrim() {
TEST_STRING.trim();
}

@Benchmark
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public void testStrip() {
TEST_STRING.strip();
}

public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}


# Run complete. Total time: 00:04:16

Benchmark Mode Cnt Score Error Units
Test.testStrip thrpt 200 126939018.461 ± 1462665.695 ops/s
Test.testTrim thrpt 200 141868439.680 ± 1243136.707 ops/s

怎么会?这是错误还是我做错了?


测试环境

  • CPU - Intel Xeon E3-1585L v5 @3.00 GHz
  • 操作系统 - Windows 7 SP 1 64 位
  • JVM——Oracle JDK 11.0.1
  • Benchamrk - JMH v 1.19

更新

为不同的字符串(空、空白等)添加了更多性能测试。

基准

@Warmup(iterations = 5, time = 1, timeUnit = SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = SECONDS)
@Fork(value = 3)
@BenchmarkMode(Mode.Throughput)
public class Test {

private static final String BLANK = ""; // Blank
private static final String EMPTY = " "; // 3 spaces
private static final String ASCII = " abc "; // ASCII characters only
private static final String UNICODE = " абв "; // Russian Characters

private static final String BIG = EMPTY.concat("Test".repeat(100)).concat(EMPTY);

@Benchmark
public void blankTrim() {
BLANK.trim();
}

@Benchmark
public void blankStrip() {
BLANK.strip();
}

@Benchmark
public void emptyTrim() {
EMPTY.trim();
}

@Benchmark
public void emptyStrip() {
EMPTY.strip();
}

@Benchmark
public void asciiTrim() {
ASCII.trim();
}

@Benchmark
public void asciiStrip() {
ASCII.strip();
}

@Benchmark
public void unicodeTrim() {
UNICODE.trim();
}

@Benchmark
public void unicodeStrip() {
UNICODE.strip();
}

@Benchmark
public void bigTrim() {
BIG.trim();
}

@Benchmark
public void bigStrip() {
BIG.strip();
}

public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}

结果

# Run complete. Total time: 00:05:23

Benchmark Mode Cnt Score Error Units
Test.asciiStrip thrpt 15 356846913.133 ± 4096617.178 ops/s
Test.asciiTrim thrpt 15 371319467.629 ± 4396583.099 ops/s
Test.bigStrip thrpt 15 29058105.304 ± 1909323.104 ops/s
Test.bigTrim thrpt 15 28529199.298 ± 1794655.012 ops/s
Test.blankStrip thrpt 15 1556405453.206 ± 67230630.036 ops/s
Test.blankTrim thrpt 15 1587932109.069 ± 19457780.528 ops/s
Test.emptyStrip thrpt 15 2126290275.733 ± 23402906.719 ops/s
Test.emptyTrim thrpt 15 406354680.805 ± 14359067.902 ops/s
Test.unicodeStrip thrpt 15 37320438.099 ± 399421.799 ops/s
Test.unicodeTrim thrpt 15 88226653.577 ± 1628179.578 ops/s

测试环境相同。

只有一个有趣的发现。包含 Unicode 字符的字符串 trim()strip() 更快

最佳答案

在 OpenJDK 11.0.1 上 String.strip()(实际上是 StringLatin1.strip())通过返回优化剥离为空 String一个驻留的 String 常量:

public static String strip(byte[] value) {
int left = indexOfNonWhitespace(value);
if (left == value.length) {
return "";
}

String.trim()(实际上是 StringLatin1.trim())总是分配一个新的 String 对象。在您的示例中 st = 3len = 3 所以

return ((st > 0) || (len < value.length)) ?
newString(value, st, len - st) : null;

将在后台复制数组并创建一个新的 String 对象

return new String(Arrays.copyOfRange(val, index, index + len),
LATIN1);

根据上述假设,我们可以更新基准以与非空 String 进行比较,它不应受到提到的 String.strip() 优化的影响:

@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.Throughput)
public class MyBenchmark {

public static final String EMPTY_STRING = " "; // 3 whitespaces
public static final String NOT_EMPTY_STRING = " a "; // 3 whitespaces with a in the middle

@Benchmark
public void testEmptyTrim() {
EMPTY_STRING.trim();
}

@Benchmark
public void testEmptyStrip() {
EMPTY_STRING.strip();
}

@Benchmark
public void testNotEmptyTrim() {
NOT_EMPTY_STRING.trim();
}

@Benchmark
public void testNotEmptyStrip() {
NOT_EMPTY_STRING.strip();
}

}

对于非空 String,运行它显示 strip()trim() 之间没有显着差异。奇怪的是,修剪为空的 String 仍然是最慢的:

Benchmark                       Mode  Cnt           Score           Error  Units
MyBenchmark.testEmptyStrip thrpt 100 1887848947.416 ± 257906287.634 ops/s
MyBenchmark.testEmptyTrim thrpt 100 206638996.217 ± 57952310.906 ops/s
MyBenchmark.testNotEmptyStrip thrpt 100 399701777.916 ± 2429785.818 ops/s
MyBenchmark.testNotEmptyTrim thrpt 100 385144724.856 ± 3928016.232 ops/s

关于java - 为什么 String.strip() 比 String.trim() 在 Java 11 中的空白字符串快 5 倍,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53640184/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com