gpt4 book ai didi

haskell - 在没有 GHC 的机器上运行 Haskell 脚本

转载 作者:行者123 更新时间:2023-12-02 17:20:42 25 4
gpt4 key购买 nike

这个问题可能是也可能不是真正的 Haskell 特定问题,但它涉及到我在执行某个编程任务时遇到的一点烦恼。

我用 Haskell 编写了一个程序,该程序对于我要解决的问题类型来说基本上是通用的,但包含两个依赖组件:脚本的运行时估计函数,根据某个基准的试运行进行计算,以及一个文件名转换函数,该函数是根据我正在使用的文件的命名方案定制的。当然,如果我想使用具有基准测试之外的性能的脚本,或者我发现估计过于保守,我想更改用于估计运行时间的函数,同样我希望能够如果我需要使用不同命名方案的不同文件,请修改文件名转换函数。

但是,我运行脚本的(远程)计算机没有安装 GHC 或 runhaskell,因此我必须从本地计算机修改、编译和重新上传代码,这有点麻烦一个麻烦。我的问题是,是否有一种简单的方法可以在代码的某些组件中实现更改,而无需重新编译以便在调用时反射(reflect)更改?

如果我的描述含糊不清,我深表歉意,并且在下面包含了血淋淋的细节,因为如果事实证明这些细节是不必要的,我不想从一开始就用不必要的细节来混淆我的问题。

我用 Haskell 编写这段代码主要是因为这是我最了解如何实现方法的语言;虽然我知道其他语言可能更可移植,但我对其他语言还不够熟悉,因此我无法在无需阅读大量文档并经过多次修订才能实现这一点的情况下实现这一点。如果用 Haskell 实现我想要的灵 active 是不切实际的,我可以理解这一点,但我宁愿知道 Haskell 无法做到这一点,也不愿收到其他可以做到这一点的语言的建议。

具体细节

我正在编写代码以在负载共享集群上运行独立作业,因此我希望最准确地估计特定作业所需的时间,而不会低于预期并导致作业终止,也不会过度估计拍摄,从而降低作业的优先级。我的时间估计基于作业程序输入的大小,并且我已经收集了足够的实际数据来推导出大小和时间之间的近似二次关系。

我目前为输入分配时间估计并由此建立作业订单的方式是使用 Haskell 脚本解析 du 的输出,执行计算并写入时间结果保存到一个新文件中,然后由作业分配脚本循环读取该文件。

该作业正在针对配对文件运行,这些文件在某种程度上共享一个通用名称,其中我希望保留的最后一个通用元素是“s”,从那时起,两个名称中都不再有“s”字符在。因此,我向后遍历名称并下降,直到到达“s”。我的代码包含在下面。它的评论很自由,这可能会有所帮助,也可能会造成困惑。其中一些对于我正在处理的任务非常具体。

-- size2time.hs
-- A Haskell script to convert file sizes into job-times, based on observed job-times for
-- various file sizes
--
--
-- This file may be compiled via the following command:
-- > ghc size2time.hs
--
-- Should any edits be made, ensure that the compiled executable is updated accordingly
--
-- The executable is to be run with the following usage
--
-- > ./size2time inputfile outputfile
--
-- where inputfile is the name of a file whose first column contains the sizes, in MB, of each fq.gz
-- (including both paired-end reads), and whose second column contains the corresponding file names, as
-- generated by
--
-- > du -m $( ls DIR/*.fq.gz ) >inputfile
--
-- where DIR is the directory containing the fq.gz files
--
-- output is the name of a file that will be created by the execution of this script, whose first
-- column will contain the run-time, in minutes, of the corresponding job (the times are based on
-- jobs run on Intel CPUs with 12 cores and 2GB of RAM, and therefore will potentially be
-- inapplicable to jobs run on CPUs of different manufacturers, with different numbers of cores,
-- and/or with different allocated RAM), and whose second column contains the scrubbed names of
-- the jobs to be run. The greater time-value for any given pair is used, with only one member of
-- each pair retained, as the file-names of each member of a pair are identical after scrubbing
--

-- import modules for command line arguments, list operations, map operations
import System.Environment
import Data.List
import qualified Data.Map as Map


main = do
args <- getArgs -- parse command line arguments: inputfile, outputfile, <ignored>
let infile = head args
outfile = head . tail $ args
contents <- readFile infile -- read the inputfile
let sf = lines contents -- split into lines
tf = map size2time sf -- peform size2time mapping
st = map sample tf -- scrub filename
stu = Map.toList . Map.fromListWith (max) $ st -- take only the longer of the two times of the paired reads
tsu = map flip2 stu -- put time first
stsu = sort tsu -- sort by time, ascending
tsustr = map unwords . map (\(x,y) -> [show x, y]) $ stsu -- convert back to string
tsulns = unlines tsustr -- join individual lines
writeFile outfile tsulns -- write to the outputfile


{- given a string, with the size of a file and the name of the file,
- returns a tuple with the estimated job-time and the unmodified name
- of the file.
-
- The size-time conversion is extrapolated from experimental data,
- with only the upper extremes considered in order to prevent timeout,
- rounding in the quadratic term, and a linear-degree time padding added
- to allow for upper extremes. If modifications are to be made to any
- coefficients, it is recommended that only linear and constant terms be increased,
- and decreases should only be made after performing sufficient alignments to collect
- enough (file size)--(actual computation time) pairs to verify that the padding is excessive,
- and to determine coefficients that more closely follow the trend of the actual data, with
- the conditions that no data point must exceed the approximation curve, and that sufficient padding
- must be provided to allow for potential inconsistency in the time required for any given size of alignment.
-}
size2time :: String -> (Int,String)
size2time sfstring = let (size:file:[]) = words sfstring -- parses out size and filename
x = fromIntegral (read size :: Int) -- floating point from numeric string
time = floor $ 0.000025 * x ^ 2 + 0.03 * x + 10 -- apply floored conversion
tfstring = (time,file)
in tfstring



{-
- removes all characters in the file-name after 's', which properly scrubs files of the format
- *--Hs--R?.fq.gz, where the ? is either 1 or 2. For filenames formatted in different ways,
- or for alternative naming of the BAM file to be generated, this function must be modified
- to suit the scenario.
-}
sample :: (a,String) -> (String,a)
sample (x,f) = let s = reverse . dropWhile (/= 's') . reverse $ f
in (s,x)

{-
- Reverses the order of a tuple, e.g. so that a Map may be made with a key to be found in the
- current second position of the tuple.
-}
flip2 :: (a,b) -> (b,a)
flip2 (x,y) = (y,x)

最佳答案

我认为您的问题没有明确的解决方案。

如果远程计算机上没有解释器或编译器,则无法修改该计算机上的 Haskell 源代码,然后将其转换为机器可读的形式。

正如其他人所说,也许您可​​以实现配置文件或命令行选项,以允许在运行时指定可能被修改的数据。

或者,假设您的远程计算机安装了 gcc,您可以让 GHC 在本地计算机上将 Haskell 代码编译为 C,然后将其传输到远程计算机,尽力了解如何实现它翻译您的代码,并对 C 代码进行更改并在远程计算机上重新编译。

关于haskell - 在没有 GHC 的机器上运行 Haskell 脚本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26062350/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com