gpt4 book ai didi

java - 正则表达式性能 java vs c++11

转载 作者:塔克拉玛干 更新时间:2023-11-03 08:05:30 27 4
gpt4 key购买 nike

我正在学习 C++ 和 Java 中的正则表达式。所以我用相同的表达式和相同的输入数量对 c++11 regex 和 java regex 进行了性能测试。奇怪的是 java regex 比 c++11 regex 更快。我的代码有什么问题吗?请纠正我

Java代码:

import java.util.regex.*;

public class Main {
private final static int MAX = 1_000_000;
public static void main(String[] args) {
long start = System.currentTimeMillis();
Pattern p = Pattern.compile("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
for (int i = 0; i < MAX; i++) {
p.matcher("abcd_ed123.t12y@haha.com").matches();
}
long end = System.currentTimeMillis();
System.out.print(end-start);
}
}

C++代码:

#include <iostream>
#include <Windows.h>
#include <regex>

using namespace std;

int main()
{
long long start = GetTickCount64();
regex pat("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
for (long i = 0; i < 1000000; i++) {
regex_match("abcd_ed123.t12y@haha.com", pat);
}
long long end = GetTickCount64();
cout << end - start;
return 0;
}

性能:

Java -> 1003ms
C++ -> 124360ms

最佳答案

使 C++ 示例可移植:

#include <iostream>
#include <chrono>
#include <regex>

using C = std::chrono::high_resolution_clock;
using namespace std::chrono_literals;

int main()
{
auto start = C::now();
std::regex pat("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
for (long i = 0; i < 1000000; i++) {
regex_match("abcd_ed123.t12y@haha.com", pat);
}
std::cout << (C::now() - start)/1.0ms;
}

在 Linux 上,使用 clang++ -std=c++14 -march=native -O3 -o clang ./test.cpp 我得到 595.970 ms。另见 Live On Wandbox

Java 在同一台机器上运行 561 毫秒

Update: Boost Regex runs much faster, see below comparative benchmark

Caveat: synthetic benchmarks like these are very prone to error: the compiler might sense that no observable side effects are done, and optimize the whole loop out, just to give an example.

更多乐趣:为混音添加助推器

使用 Boost 1.67Nonius Micro-Benchmarking Framework

enter image description here

我们可以看到 Boost 的 Regex 实现速度相当更快。

在线查看详细示例数据交互:https://plot.ly/~sehe/25/

Code Used

#include <iostream>
#include <regex>
#include <boost/regex.hpp>
#include <boost/xpressive/xpressive_static.hpp>
#define NONIUS_RUNNER
#include <nonius/benchmark.h++>
#include <nonius/main.h++>

template <typename Re>
void test(Re const& re) {
regex_match("abcd_ed123.t12y@haha.com", re);
}

static const std::regex std_normal("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
static const std::regex std_optimized("^[\\w._]+@\\w+\\.[a-zA-Z]+$", std::regex::ECMAScript | std::regex::optimize);
static const boost::regex boost_normal("^[\\w._]+@\\w+\\.[a-zA-Z]+$");
static const boost::regex boost_optimized("^[\\w._]+@\\w+\\.[a-zA-Z]+$", static_cast<boost::regex::flag_type>(boost::regex::ECMAScript | boost::regex::optimize));

static const auto boost_xpressive = []{
using namespace boost::xpressive;
return cregex { bos >> +(_w | '.' | '_') >> '@' >> +_w >> '.' >> +alpha >> eos };
}();

NONIUS_BENCHMARK("std_normal", [] { test(std_normal); })
NONIUS_BENCHMARK("std_optimized", [] { test(std_optimized); })
NONIUS_BENCHMARK("boost_normal", [] { test(boost_normal); })
NONIUS_BENCHMARK("boost_optimized", [] { test(boost_optimized); })
NONIUS_BENCHMARK("boost_xpressive", [] { test(boost_xpressive); })

Note Here's the output of the Hotspot JVM JIT compiler:

This was generated using

LD_PRELOAD=/home/sehe/Projects/stackoverflow/fcml-1.1.3/example/hsdis/.libs/libhsdis-amd64.so ./jre1.8.0_171/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Main 2>&1 > disasm.a

关于java - 正则表达式性能 java vs c++11,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50610414/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com