clojure - 将序列中的多个连续项目懒惰地折叠成单个项目的最佳方法-6ren

clojure - 将序列中的多个连续项目懒惰地折叠成单个项目的最佳方法

转载作者：行者123 更新时间：2023-12-02 23:13:34

25

4

[注意:标题和文本经过大量编辑，以便更清楚地表明我并不是特别追求字符串，而是在一般序列以及对相同序列的惰性处理之后]

以字符序列/字符串为例，假设我想将字符串变成这样

"\t a\r s\td \t \r \n f \r\n"

进入

" a s d f "

更一般地说，我想将序列中的所有连续空白(或任何其他任意项目集)转换为单个项目，并且是惰性的。

我想出了以下partition-by/mapcat组合，但想知道是否有更简单或更好的方法(可读性、性能等)来完成同样的事情。

(defn is-wsp?
  [c]
  (if (#{\space \tab \newline \return} c) true))

(defn collapse-wsp
  [coll]
  (mapcat
   (fn [[first-elem :as s]]
     (if (is-wsp? first-elem) [\space] s))
   (partition-by is-wsp? coll)))

实际操作:

=> (apply str (collapse-wsp "\t    a\r          s\td  \t \r \n         f \r\n"))
" a s d f "

更新:我使用字符串/字符序列/wsp 作为示例，但我实际上想要的是任何类型的序列上的通用函数，该函数可以通过某个预定义项目折叠任意数量的连续项目，这些项目是预定义项目集的一部分。我特别想知道是否有更好的partition-by/mapcat替代方案，而不是如果可以针对“字符串”特殊情况进行优化。

更新 2:

这是一个完全惰性的版本 - 我担心上面的版本并不是完全惰性的，除了它正在做多余的 is-wsp？检查。我概括了参数名称等，因此它不仅仅是看起来可以轻松地用 String.whatever() 调用替换的东西 - 它是关于任意序列的。

(defn lazy-collapse
  ([coll is-collapsable-item? collapsed-item-representation] (lazy-collapse coll is-collapsable-item? collapsed-item-representation false))
  ([coll is-collapsable-item? collapsed-item-representation in-collapsable-segment?]
  (let [step (fn [coll in-collapsable-segment?]
               (when-let [item (first coll)]
                 (if (is-collapsable-item? item)
                   (if in-collapsable-segment?
                     (recur (rest coll) true)
                     (cons collapsed-item-representation (lazy-collapse (rest coll) is-collapsable-item? collapsed-item-representation true)))
                   (cons item (lazy-collapse (rest coll) is-collapsable-item? collapsed-item-representation false)))))]
    (lazy-seq (step coll in-collapsable-segment?)))))

这很快，完全是懒惰的，但我希望能够更简洁地表达这一点，因为我自己也很懒。

到目前为止惰性折叠器的基准:代码是否可读很容易通过查看代码来判断，但为了了解它们在性能方面的比较，这里是我的基准测试。我首先检查该函数是否执行了它应该执行的操作，然后我吐出需要多长时间

创建惰性序列 100 万次
创建惰性序列并获取第一项 1M 次
创建惰性序列并获取第二项 1M 次
创建惰性序列并获取最后一项(即完全实现惰性序列)1M 次

测试 1 到 3 旨在至少衡量一点懒惰程度。我运行了几次测试，执行时间没有显着变化。

user=> (map
   (fn [collapse]
     (println (class collapse) (str "|" (apply str (collapse test-str is-wsp? \space)) "|"))
     (time (dotimes [_ 1000000] (collapse test-str is-wsp? \space)))
     (time (dotimes [_ 1000000] (first (collapse test-str is-wsp? \space))))
     (time (dotimes [_ 1000000] (second (collapse test-str is-wsp? \space))))
     (time (dotimes [_ 1000000] (last (collapse test-str is-wsp? \space)))))
   [collapse-overthink collapse-smith collapse-normand lazy-collapse])

user$collapse_overthink | a s d f |
"Elapsed time: 153.490591 msecs"
"Elapsed time: 3064.721629 msecs"
"Elapsed time: 4337.932487 msecs"
"Elapsed time: 24797.222682 msecs"

user$collapse_smith | a s d f |
"Elapsed time: 141.474904 msecs"
"Elapsed time: 812.998848 msecs"
"Elapsed time: 2112.331739 msecs"
"Elapsed time: 10750.224816 msecs"

user$collapse_normand | a s d f |
"Elapsed time: 314.978309 msecs"
"Elapsed time: 1423.779761 msecs"
"Elapsed time: 1669.660257 msecs"
"Elapsed time: 8074.759077 msecs"

user$lazy_collapse | a s d f |
"Elapsed time: 169.906088 msecs"
"Elapsed time: 638.030401 msecs"
"Elapsed time: 1195.445016 msecs"
"Elapsed time: 6050.945856 msecs"

到目前为止的底线:最好的代码是最慢的，最丑的代码是最快的。我很确定事情不必是这样的......

最佳答案

这是迄今为止我最快的解决方案:(与 M Smith 的基本相同，但没有解构)

(defn collapse [xs pred rep]
  (when-let [x (first xs)]
    (lazy-seq 
      (if (pred x)
        (cons rep (collapse (drop-while pred (rest xs)) pred rep))
        (cons x (collapse (rest xs) pred rep))))))

这是一个更漂亮的解决方案，但速度慢了 3 倍(!):(实际上与 SuperHorst 的初始版本相同......)

(defn collapse [col pred rep]
  (let [f (fn [[x & more :as xs]] (if (pred x) [rep] xs))]
    (mapcat f (partition-by #(if (pred %) true) col))))

迷你基准 ( full code ) 输出:

$ clj collapse.clj 
     SuperHorst: "Elapsed time: 58535.737037 msecs"
      Overthink: "Elapsed time: 70154.744605 msecs"
        M Smith: "Elapsed time: 89484.984606 msecs"
   Eric Normand: "Elapsed time: 83121.309838 msecs"

示例:

(def test-str "\t a\r      s\td \t \r \n          f \r\n")
(def is-ws? #{\space \tab \newline \return})

user=> (apply str (collapse test-str is-ws? \space))
" a s d f "

用于不同类型的 seq:

user=> (collapse (range 1 110) #(= 2 (count (str %))) \X)
(1 2 3 4 5 6 7 8 9 \X 100 101 102 103 104 105 106 107 108 109)

它完全是懒惰的:

user=> (type (collapse test-str is-ws? \space))
clojure.lang.LazySeq
user=> (type (collapse (range 1 110) #(= 2 (count (str %))) \X))
clojure.lang.LazySeq

<小时/>

旧的有缺陷的版本:

(defn collapse-bug [col pred rep]
  (let [f (fn [x] (if (pred x) rep x))]
    (map (comp f first) (partition-by f col))))

错误在于它会吃掉与 pred 不匹配的连续项目。

关于clojure - 将序列中的多个连续项目懒惰地折叠成单个项目的最佳方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6728567/

25

4

0

文章推荐： crystal-lang - Crystal 郎 : concat path in a cross-platform way?

文章推荐： clojure - 对类名的解析感到困惑

文章推荐： c# - 如何在ElasticSearch中仅选择子对象的匹配字段？

文章推荐： asp.net - 使用 ASP.NET AJAX 控制工具包设置焦点

Ruby 方法() 方法
我想了解 Ruby 方法 methods() 是如何工作的。我尝试使用“ruby 方法”在 Google 上搜索，但这不是我需要的。我也看过 ruby-doc.org，但我没有找到这种方法。
VBS教程：方法-Test 方法
Test 方法对指定的字符串执行一个正则表达式搜索，并返回一个 Boolean 值指示是否找到匹配的模式。 object.Test(string) 参数 object 必选项。总是一个
VBS教程：方法-Replace 方法
Replace 方法替换在正则表达式查找中找到的文本。 object.Replace(string1, string2) 参数 object 必选项。总是一个 RegExp 对象的名称。
VBS教程：方法-Raise 方法
Raise 方法生成运行时错误 object.Raise(number, source, description, helpfile, helpcontext) 参数 object 应为
VBS教程：方法-Execute 方法
Execute 方法对指定的字符串执行正则表达式搜索。 object.Execute(string) 参数 object 必选项。总是一个 RegExp 对象的名称。 string
VBS教程：方法-Clear 方法
Clear 方法清除 Err 对象的所有属性设置。 object.Clear object 应为 Err 对象的名称。说明在错误处理后，使用 Clear 显式地清除 Err 对象。此
VBS教程：方法-CopyFile 方法
CopyFile 方法将一个或多个文件从某位置复制到另一位置。 object.CopyFile source, destination[, overwrite] 参数 object 必选
VBS教程：方法-Copy 方法
Copy 方法将指定的文件或文件夹从某位置复制到另一位置。 object.Copy destination[, overwrite] 参数 object 必选项。应为 File 或 F
VBS教程：方法-Close 方法
Close 方法关闭打开的 TextStream 文件。 object.Close object 应为 TextStream 对象的名称。说明下面例子举例说明如何使用 Close 方
VBS教程：方法-BuildPath 方法
BuildPath 方法向现有路径后添加名称。 object.BuildPath(path, name) 参数 object 必选项。应为 FileSystemObject 对象的名称
VBS教程：方法-GetFolder 方法
GetFolder 方法返回与指定的路径中某文件夹相应的 Folder 对象。 object.GetFolder(folderspec) 参数 object 必选项。应为 FileSy
VBS教程：方法-GetFileName 方法
GetFileName 方法返回指定路径（不是指定驱动器路径部分）的最后一个文件或文件夹。 object.GetFileName(pathspec) 参数 object 必选项。应为
VBS教程：方法-GetFile 方法
GetFile 方法返回与指定路径中某文件相应的 File 对象。 object.GetFile(filespec) 参数 object 必选项。应为 FileSystemObject
VBS教程：方法-GetExtensionName 方法
GetExtensionName 方法返回字符串，该字符串包含路径最后一个组成部分的扩展名。 object.GetExtensionName(path) 参数 object 必选项。应
VBS教程：方法-GetDriveName 方法
GetDriveName 方法返回包含指定路径中驱动器名的字符串。 object.GetDriveName(path) 参数 object 必选项。应为 FileSystemObjec
VBS教程：方法-GetDrive 方法
GetDrive 方法返回与指定的路径中驱动器相对应的 Drive 对象。 object.GetDrive drivespec 参数 object 必选项。应为 FileSystemO
VBS教程：方法-GetBaseName 方法
GetBaseName 方法返回字符串，其中包含文件的基本名 (不带扩展名), 或者提供的路径说明中的文件夹。 object.GetBaseName(path) 参数 object 必
VBS教程：方法-GetAbsolutePathName 方法
GetAbsolutePathName 方法从提供的指定路径中返回完整且含义明确的路径。 object.GetAbsolutePathName(pathspec) 参数 object
VBS教程：方法-FolderExists 方法
FolderExists 方法如果指定的文件夹存在，则返回 True；否则返回 False。 object.FolderExists(folderspec) 参数 object 必选项
VBS教程：方法-FileExists 方法
FileExists 方法如果指定的文件存在返回 True；否则返回 False。 object.FileExists(filespec) 参数 object 必选项。应为 FileS

首页

博学

6Ren·AI

商城

clojure - 将序列中的多个连续项目懒惰地折叠成单个项目的最佳方法