performance - 提高字符串操作的速度-6ren

performance - 提高字符串操作的速度

转载作者：行者123 更新时间：2023-12-04 14:46:20

这是一个后续问题，有点像这个问题:Write an efficient string replacement function? .

在(尽管遥远的) future ，我希望能够进行自然语言处理。当然，因此字符串操作的速度很重要。偶然地，我偶然发现了这个测试:http://raid6.com.au/~onlyjob/posts/arena/ - 所有测试都有偏差，这次也不异常(exception)。然而，它对我提出了重要的问题。所以我写了一些测试来看看我是怎么做的:

这是我的第一次尝试(我称之为#A):

#A

(defun test ()
  (declare (optimize (debug 0) (safety 0) (speed 3)))
  (loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
     and initial = (get-internal-real-time)
     for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
     for ln = (* (length addidtion) i)
     for accumulated = addidtion
     then (loop with concatenated = (concatenate 'string accumulated addidtion)
             for start = (search "efgh" concatenated)
             while start do (replace concatenated "____" :start1 start)
             finally (return concatenated))
     when (zerop (mod ln (* 1024 256))) do
       (format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
  (values))

(test)

对结果感到困惑，我尝试使用 cl-ppcre - 我不知道我希望什么，但结果非常糟糕......这是我用于测试的代码:

#B

(ql:quickload "cl-ppcre")

(defun test ()
  (declare (optimize (debug 0) (safety 0) (speed 3)))
  (loop with addidtion = (concatenate 'string "abcdefgh" "efghefgh")
     and initial = (get-internal-real-time)
     for i from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
     for ln = (* (length addidtion) i)
     for accumulated = addidtion
     then (cl-ppcre:regex-replace-all "efgh" (concatenate 'string accumulated addidtion) "____")
     when (zerop (mod ln (* 1024 256))) do
       (format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
  (values))

(test)

那么，为了避免一些概括，我决定自己写一个，虽然有点天真:

#C

(defun replace-all (input match replacement)
  (declare (type string input match replacement)
           (optimize (debug 0) (safety 0) (speed 3)))
  (loop with pattern fixnum = (1- (length match))
     with i fixnum = pattern
     with j fixnum = i
     with len fixnum = (length input) do
       (cond
         ((>= i len) (return input))
         ((zerop j)
          (loop do
               (setf (aref input i) (aref replacement j) i (1+ i))
               (if (= j pattern)
                   (progn (incf i pattern) (return))
                   (incf j))))
         ((char= (aref input i) (aref match j))
          (decf i) (decf j))
         (t (setf i (+ i 1 (- pattern j)) j pattern)))))

(defun test ()
  (declare (optimize (debug 0) (safety 0) (speed 3)))
  (loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
     and initial = (get-internal-real-time)
     for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
     for ln fixnum = (* (length addidtion) i)
     for accumulated string = addidtion
     then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
     when (zerop (mod ln (* 1024 256))) do
       (format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
  (values))

(test)

几乎和 cl-ppcre 一样慢!现在，这太不可思议了!我在这里找不到任何会导致如此糟糕性能的东西......而且它仍然很糟糕:(

意识到到目前为止标准函数的性能最好，我查看了 SBCL 源代码，经过一些阅读，我想出了这个:

#D

(defun replace-all (input match replacement &key (start 0))
  (declare (type simple-string input match replacement)
           (type fixnum start)
           (optimize (debug 0) (safety 0) (speed 3)))
  (loop with input-length fixnum = (length input)
     and match-length fixnum = (length match)
     for i fixnum from 0 below (ceiling (the fixnum (- input-length start)) match-length) do
       (loop with prefix fixnum = (+ start (the fixnum (* i match-length)))
          for j fixnum from 0 below match-length do
            (when (<= (the fixnum (+ prefix j match-length)) input-length)
              (loop for k fixnum from (+ prefix j) below (the fixnum (+ prefix j match-length))
                 for n fixnum from 0 do
                   (unless (char= (aref input k) (aref match n)) (return))
                 finally
                   (loop for m fixnum from (- k match-length) below k
                      for o fixnum from 0 do
                        (setf (aref input m) (aref replacement o))
                      finally
                        (return-from replace-all
                          (replace-all input match replacement :start k))))))
       finally (return input)))

(defun test ()
  (declare (optimize (debug 0) (safety 0) (speed 3)))
  (loop with addidtion string = (concatenate 'string "abcdefgh" "efghefgh")
     and initial = (get-internal-real-time)
     for i fixnum from 0 below (+ (* (/ 1024 (length addidtion)) 1024 4) 1000)
     for ln fixnum = (* (length addidtion) i)
     for accumulated string = addidtion
     then (replace-all (concatenate 'string accumulated addidtion) "efgh" "____")
     when (zerop (mod ln (* 1024 256))) do
       (format t "~&~f s | ~d Kb" (/ (- (get-internal-real-time) initial) 1000) (/ ln 1024)))
  (values))

(test)

最后，我可以获胜，尽管对标准库的性能只有一小部分 - 但与几乎所有其他东西相比，它仍然非常非常糟糕......

以下是结果表:

| SBCL #A   | SBCL #B   | SBCL #C    | SBCL #D   | C gcc 4 -O3 | String size |
|-----------+-----------+------------+-----------+-------------+-------------|
| 17.463 s  | 166.254 s | 28.924 s   | 16.46 s   | 1 s         | 256 Kb      |
| 68.484 s  | 674.629 s | 116.55 s   | 63.318 s  | 4 s         | 512 Kb      |
| 153.99 s  | gave up   | 264.927 s  | 141.04 s  | 10 s        | 768 Kb      |
| 275.204 s | . . . . . | 474.151 s  | 251.315 s | 17 s        | 1024 Kb     |
| 431.768 s | . . . . . | 745.737 s  | 391.534 s | 27 s        | 1280 Kb     |
| 624.559 s | . . . . . | 1079.903 s | 567.827 s | 38 s        | 1536 Kb     |

现在，问题是:我做错了什么？这是 Lisp 字符串固有的东西吗？这可能可以通过……什么来缓解？

从长远来看，我什至会考虑编写一个专门的字符串处理库。如果问题不是我的错误代码，而是实现。这样做有意义吗？如果是，你会建议用什么语言来做？

编辑:只是为了记录，我现在正在尝试使用这个库: https://github.com/Ramarren/ropes处理字符串连接。不幸的是，它没有替换功能，并且进行多次替换并不是一件容易的事。但是当我有东西时，我会更新这篇文章。

我试图稍微改变怀远的变体以使用数组的填充指针而不是字符串连接(以实现类似于 Paulo Madeira 建议的 StringBuilder 的东西。它可能可以进一步优化，但我不确定类型/哪个方法会更快/是否值得为 * 和 + 重新定义类型以让它们只在 fixnum 或 signed-byte 上运行。无论如何，这是代码和基准:

(defun test/e ()
  (declare (optimize speed))
  (labels ((min-power-of-two (num)
             (declare (type fixnum num))
             (decf num)
             (1+
              (progn
                (loop for i fixnum = 1 then (the (unsigned-byte 32) (ash i 1))
                   while (< i 17) do
                     (setf num
                           (logior
                            (the fixnum
                                 (ash num (the (signed-byte 32)
                                               (+ 1 (the (signed-byte 32)
                                                         (lognot i)))))) num))) num)))
           (join (x y)
             (let ((capacity (array-dimension x 0))
                   (desired-length (+ (length x) (length y)))
                   (x-copy x))
               (declare (type fixnum capacity desired-length)
                        (type (vector character) x y x-copy))
               (when (< capacity desired-length)
                 (setf x (make-array
                          (min-power-of-two desired-length)
                          :element-type 'character
                          :fill-pointer desired-length))
                 (replace x x-copy))
               (replace x y :start1 (length x))
               (setf (fill-pointer x) desired-length) x))
           (seek (old str pos)
             (let ((q (position (aref old 0) str :start pos)))
               (and q (search old str :start2 q))))
           (subs (str old new)
             (loop for p = (seek old str 0) then (seek old str p)
                while p do (replace str new :start1 p))
             str))
    (declare (inline min-power-of-two join seek subs)
             (ftype (function (fixnum) fixnum) min-power-of-two))
    (let* ((builder
            (make-array 16 :element-type 'character
                        :initial-contents "abcdefghefghefgh"
                        :fill-pointer 16))
           (ini (get-internal-real-time)))
      (declare (type (vector character) builder))
      (loop for i fixnum below (+ 1000 (* 4 1024 1024 (/ (length builder))))
         for j = builder then
           (subs (join j builder) "efgh" "____")
         for k fixnum = (* (length builder) i)
         when (= 0 (mod k (* 1024 256)))
         do (format t "~&~8,2F sec ~8D kB"
                    (/ (- (get-internal-real-time) ini) 1000)
                    (/ k 1024))))))

    1.68 sec      256 kB
    6.63 sec      512 kB
   14.84 sec      768 kB
   26.35 sec     1024 kB
   41.01 sec     1280 kB
   59.55 sec     1536 kB
   82.85 sec     1792 kB
  110.03 sec     2048 kB

最佳答案

瓶颈是search功能，这可能未在 SBCL 中优化。以下版本使用position帮助它跳过不可能的区域，并且速度大约是我机器上的 #A 版本的 10 倍:

(defun test/e ()
  (declare (optimize speed))
  (labels ((join (x y)
             (concatenate 'simple-base-string x y))
           (seek (old str pos)
             (let ((q (position (char old 0) str :start pos)))
               (and q (search old str :start2 q))))
           (subs (str old new)
             (loop for p = (seek old str 0) then (seek old str p)
                   while p do (replace str new :start1 p))
             str))
    (declare (inline join seek subs))
    (let* ((str (join "abcdefgh" "efghefgh"))
           (ini (get-internal-real-time)))
      (loop for i below (+ 1000 (* 4 1024 1024 (/ (length str))))
            for j = str then (subs (join j str) "efgh" "____")
            for k = (* (length str) i)
            when (= 0 (mod k (* 1024 256)))
              do (format t "~&~8,2F sec ~8D kB"
                         (/ (- (get-internal-real-time) ini) 1000)
                         (/ k 1024))))))

关于performance - 提高字符串操作的速度，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17629896/

文章推荐： ckan - CKAN Extensions 可以有 Controller 吗？

文章推荐： r - 如何在 impress.js 中使用 slidify

文章推荐： r - 在多行文本中创建上标

文章推荐： dynamics-crm-2011 - Dynamics CRM 2011 SQL MetadataSchema 主名称属性

java - Struts2 操作 > JSP > 操作
我正在努力做到这一点在我的操作中从数据库获取对象列表(确定) 在 JSP 上打印(确定) 此列表作为 JSP 中的可编辑表出现。我想修改然后将其提交回同一操作以将其保存在我的数据库中(失败。当我使用
linq - 不支持嵌套查询。操作 1 ='UnionAll' 操作 2 ='MultiStreamNest'
我有以下形式的 Linq to Entities 查询: var x = from a in SomeData where ... some conditions ... select
c# - 不支持嵌套查询。操作 1 ='UnionAll' 操作 2 ='MultiStreamNest'
我有以下查询。 var query = Repository.Query() .Where(p => !p.IsDeleted && p.Article.ArticleSections.Cou
java - Jtable ListSelectionListener 不响应 jtable 操作，而是响应同一个类中的另一个 jtable 操作
我正在编写一个应用程序包，其中包含一个主类，其中主方法与GUI类分开，GUI类包含一个带有jtabbedpane的jframe，它有两个选项卡，第一个选项卡包含一个jtable，称为jtable1，第
c# - LINQ 嵌套数组和三元运算符。不支持嵌套查询。操作 1 ='Case' 操作 2 ='Collect'
以下代码产生错误 The nested query is not supported. Operation1='Case' Operation2='Collect' 问题是我做错了什么？我该如何解决？
Redis哨兵中的C#操作
我已经为 HA redis 集群(2 个副本、1 个主节点、3 个哨兵)设置了本地 docker 环境。只有哨兵暴露端口(10021、10022、10023)。我使用的是 stackexchange
液体模板过滤器中的日期数学/操作
我正在 Desk.com 中构建一个“集成 URL”，它使用 Shopify Liquid 模板过滤器语法。对于开始日期为 7 天前而结束日期为现在的查询，此 URL 需要包含“开始日期”和“结束日期
Python为什么不支持 i++/i--操作
你一定想过。然而情况却不理想，python中只能使用类似于 i++/i--等操作。 python中的自增操作下面代码几乎是所有程序员在python中进行自增(减)操作的常用
GitHub 操作 - 将分支名称显示为构建名称
我需要在每个使用 github 操作的手动构建中显示分支。例如:https://gyazo.com/2131bf83b0df1e2157480e5be842d4fb 我应该显示分支而不是一个。最佳答
Perl qr//操作
我有一个关于 Perl qr 运算符的问题: #!/usr/bin/perl -w &mysplit("a:b:c", /:/); sub mysplit { my($str, $patt
uml - ArgoUML 操作
我已经使用 ArgoUML 创建了一个 ERD(实体关系图)，我希望在一个类中创建两个操作，它们都具有 void 返回类型。但是，我只能创建一个返回 void 类型的操作。例如: 我能够将 book
关于拉取请求和主分支的 Github 操作
Github 操作仍处于测试阶段并且很新，但我希望有人可以提供帮助。我认为可以在主分支和拉取请求上运行 github 操作，如下所示: on: pull_request push: b
用于记录的 Twilio 操作
我正在尝试创建一个 Twilio 工作流来调用电话并记录用户所说的内容。为此，我正在使用 Record，但我不确定要在 action 参数中放置什么。尽管我知道 Twilio 会发送有关调用该 UR
OpenGL 模板缓冲区 OR 操作？
我不确定这是否可行，但值得一试。我正在使用模板缓冲区来减少使用此算法的延迟渲染器中光体积的过度绘制(当相机位于体积之外时): 使用廉价的着色器，将深度测试设置为 LEQUAL 绘制背面，将它们标记在模
用于复制和重命名文件的 GitHub 操作
有没有聪明的方法来复制和重命名文件通过 GitHub 操作？我想将一些自述文件复制到 /docs文件夹(:= 同一个 repo，不是远程的!)，它们将根据它们的 frontmatter 重命名
PowerShell CSV 操作
我有一个 .csv 文件，其中第一列包含用户名。它们采用 FirstName LastName 的形式。我想获取 FirstName 并将 LastName 的第一个字符添加到它上面，然后删除空格。然
Sitecore - 操作 URL
Sitecore 根据 Sitecore 树中定义的项目名称生成 URL， http://samplewebsite/Pages/Sample Page 但我们的客户有兴趣降低所有 URL(页面/示例
单击按钮时的 Angularjs 操作
我正在尝试进行一些计算，但是一旦我输入金额，它就会完成。我只是希望通过单击按钮而不是自动发生这种情况。到目前为止我做了什么: Angular JS - programming-fr
将文件从一个存储库复制到另一个存储库的 github 操作
我的公司创建了一种在环境之间移动文件的复杂方法，现在我们希望将某些构建的 JS 文件(已转换和缩小)从一个 github 存储库移动到另一个。使用 github 操作可以实现这一点吗？最佳答案最简
java - JSONArray 操作
在我的代码中，我创建了一个 JSONArray 对象。并向 JSONArray 对象添加了两个 JSONObject。我使用的是 json-simple-1.1.jar。我的代码是 package j

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

performance - 提高字符串操作的速度