python - 在 Python 中嵌入低性能脚本语言-6ren

python - 在 Python 中嵌入低性能脚本语言

转载作者：IT老高更新时间：2023-10-28 21:11:55

25

4

我有一个网络应用程序。作为其中的一部分，我需要应用程序的用户能够编写(或复制和粘贴)非常简单的脚本来针对他们的数据运行。

脚本真的可以很简单，性能只是最次要的问题。我的意思是脚本的复杂性示例如下:

ratio = 1.2345678
minimum = 10

def convert(money)
    return money * ratio
end

if price < minimum
    cost = convert(minimum)
else
    cost = convert(price)
end

其中价格和成本是一个全局变量(我可以输入环境并在计算后访问)。

不过，我确实需要保证一些东西。

运行的任何脚本都无法访问 Python 环境。他们不能导入东西、调用我没有明确为他们公开的方法、读取或写入文件、生成线程等。我需要完全锁定。
我需要能够对脚本运行的“周期”数量进行硬性限制。循环在这里是一个通用术语。如果语言是字节编译的，则可能是 VM 指令。应用调用 Eval/Apply 循环。或者只是通过一些运行脚本的中央处理循环进行迭代。细节不如我能够在短时间内停止运行并向所有者发送电子邮件并说“您的脚本似乎不仅仅是将几个数字相加 - 将它们整理出来。”
它必须在 Vanilla 未打补丁的 CPython 上运行。

到目前为止，我一直在为这项任务编写自己的 DSL。我能做到。但我想知道我是否可以建立在巨人的肩膀上。是否有适用于 Python 的迷你语言可以做到这一点？

有很多 hacky Lisp 变体(甚至是我在 Github 上写的)，但我更喜欢非专业语法的东西(比如更多 C 或 Pascal)，因为我认为这是一个除了自己编码之外，我想要更成熟的东西。

有什么想法吗？

最佳答案

这是我对这个问题的看法。要求用户脚本在 vanilla CPython 中运行意味着您需要为您的迷你语言编写解释器，或者将其编译为 Python 字节码(或使用 Python 作为源语言)，然后在执行之前“清理”字节码。

基于用户可以编写的假设，我已经举了一个简单的例子他们在 Python 中编写的脚本，并且源代码和字节码可以足够通过从解析中过滤不安全语法的某种组合进行清理树和/或从字节码中删除不安全的操作码。

解决方案的第二部分要求用户脚本字节码是定期被看门狗任务中断，这将确保用户脚本不超过某些操作码限制，并且所有这些都可以在 vanilla CPython 上运行。

我的尝试总结，主要集中在问题的第二部分。

用户脚本是用 Python 编写的。
使用byteplay过滤和修改字节码。
检测用户的字节码以插入操作码计数器并调用上下文切换到看门狗任务的函数。
使用greenlet执行用户的字节码，带有yield切换在用户脚本和看门狗协程之间。
看门狗对操作码的数量实现了预设限制，在引发错误之前执行。

希望这至少朝着正确的方向发展。我有兴趣听听更多关于您的解决方案的信息。

lowperf.py的源代码:

# std
import ast
import dis
import sys
from pprint import pprint

# vendor
import byteplay
import greenlet

# bytecode snippet to increment our global opcode counter
INCREMENT = [
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.LOAD_CONST, 1),
    (byteplay.INPLACE_ADD, None),
    (byteplay.STORE_GLOBAL, '__op_counter')
    ]

# bytecode snippet to perform a yield to our watchdog tasklet.
YIELD = [
    (byteplay.LOAD_GLOBAL, '__yield'),
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.CALL_FUNCTION, 1),
    (byteplay.POP_TOP, None)
    ]

def instrument(orig):
    """
    Instrument bytecode.  We place a call to our yield function before
    jumps and returns.  You could choose alternate places depending on 
    your use case.
    """
    line_count = 0
    res = []
    for op, arg in orig.code:
        line_count += 1

        # NOTE: you could put an advanced bytecode filter here.

        # whenever a code block is loaded we must instrument it
        if op == byteplay.LOAD_CONST and isinstance(arg, byteplay.Code):
            code = instrument(arg)
            res.append((op, code))
            continue

        # 'setlineno' opcode is a safe place to increment our global 
        # opcode counter.
        if op == byteplay.SetLineno:
            res += INCREMENT
            line_count += 1

        # append the opcode and its argument
        res.append((op, arg))

        # if we're at a jump or return, or we've processed 10 lines of
        # source code, insert a call to our yield function.  you could 
        # choose other places to yield more appropriate for your app.
        if op in (byteplay.JUMP_ABSOLUTE, byteplay.RETURN_VALUE) \
                or line_count > 10:
            res += YIELD
            line_count = 0

    # finally, build and return new code object
    return byteplay.Code(res, orig.freevars, orig.args, orig.varargs,
        orig.varkwargs, orig.newlocals, orig.name, orig.filename,
        orig.firstlineno, orig.docstring)

def transform(path):
    """
    Transform the Python source into a form safe to execute and return
    the bytecode.
    """
    # NOTE: you could call ast.parse(data, path) here to get an
    # abstract syntax tree, then filter that tree down before compiling
    # it into bytecode.  i've skipped that step as it is pretty verbose.
    data = open(path, 'rb').read()
    suite = compile(data, path, 'exec')
    orig = byteplay.Code.from_code(suite)
    return instrument(orig)

def execute(path, limit = 40):
    """
    This transforms the user's source code into bytecode, instrumenting
    it, then kicks off the watchdog and user script tasklets.
    """
    code = transform(path)
    target = greenlet.greenlet(run_task)

    def watcher_task(op_count):
        """
        Task which is yielded to by the user script, making sure it doesn't
        use too many resources.
        """
        while 1:
            if op_count > limit:
                raise RuntimeError("script used too many resources")
            op_count = target.switch()

    watcher = greenlet.greenlet(watcher_task)
    target.switch(code, watcher.switch)

def run_task(code, yield_func):
    "This is the greenlet task which runs our user's script."
    globals_ = {'__yield': yield_func, '__op_counter': 0}
    eval(code.to_code(), globals_, globals_)

execute(sys.argv[1])

这是一个示例用户脚本user.py:

def otherfunc(b):
    return b * 7

def myfunc(a):
    for i in range(0, 20):
        print i, otherfunc(i + a + 3)

myfunc(2)

这是一个示例运行:

% python lowperf.py user.py

0 35
1 42
2 49
3 56
4 63
5 70
6 77
7 84
8 91
9 98
10 105
11 112
Traceback (most recent call last):
  File "lowperf.py", line 114, in <module>
    execute(sys.argv[1])
  File "lowperf.py", line 105, in execute
    target.switch(code, watcher.switch)
  File "lowperf.py", line 101, in watcher_task
    raise RuntimeError("script used too many resources")
RuntimeError: script used too many resources

关于python - 在 Python 中嵌入低性能脚本语言，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5099043/

25

4

0

文章推荐： python - 尝试在 OSX 上安装 easy_install 时权限被拒绝

文章推荐： java - 如何让 liquibase 使用 slf4j 进行日志记录？

文章推荐： java - 为什么 Eclipse 不将编译器切换到 Java 8？

文章推荐： python - 如何使用 BeautifulSoup 找到评论标签 ？

带有重载提取器的 Scala 语言？
至少在某些 ML 系列语言中，您可以定义可以执行模式匹配的记录，例如http://learnyouahaskell.com/making-our-own-types-and-typeclasses -
用于并发编程的 .NET 语言
这可能是其他人已经看到的一个问题，但我正在尝试寻找一种专为(或支持)并发编程而设计的语言，该语言可以在 .net 平台上运行。我一直在 erlang 中进行辅助开发，以了解该语言，并且喜欢建立一个稳
ide - 语言+ IDE教学高中生？
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be
ipc - 语言/操作系统之间的进程间通信
我正在寻找一种进程间通信工具，可以在相同或不同系统上运行的语言和/或环境之间使用。例如，它应该允许在 Java、C# 和/或 C++ 组件之间发送信号，并且还应该支持某种排队机制。唯一明显与环境和语言
java - 使用正则表达式解析不同的语言环境/语言？
我有一些以不同语言返回的文本。现在，客户端返回的文本格式为(en-us，又名美国英语): Stuff here to keep. -- Delete Here -- all of this below
Julia 语言 : findInterval
问题:我希望在 R 中找到类似 findInterval 的函数，它为输入提供一个标量和一个表示区间起点的向量，并返回标量落入的区间的索引。例如在 R 中: findInterval(x = 2.6,
Java 语言 IllegalStateException
我是安卓新手。我正在尝试进行简单的登录 Activity ，但当我单击“登录”按钮时出现运行时错误。我认为我没有正确获取数据。我已经检查过，SQLite 中有一个与该 PK 相对应的数据。日志猫。
C#语言，计算器
大家好，感谢您帮助我。我用 C# 制作了这个计算器，但遇到了一个问题。当我添加像 5+5+5 这样的东西时，它给了我正确的结果，但是当我想减去两个以上的数字并且还想除或乘以两个以上的数字时，我没有
C 语言以二进制方式访问内存
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 4 年前。 Improve th
C 语言 - 如何修复代码中的二分查找函数？
这就是我所拥有的 #include #include void print(int a[], int size); void sort (int a[], int size); v
C 语言我的代码中出现错误
你好，我正在寻找我哪里做错了？ #include #include int main(int argc, char *argv[]) { int account_on_the_ban
将数字读入数组时代码崩溃...C 语言
嘿，当我开始向数组输入数据时，我的代码崩溃了。该程序应该将数字读入数组，然后将新数字插入数组中，最后按升序排列所有内容。我不确定它出了什么问题。有人有建议吗？这是我的代码 #include #in
凯撒密码 C 语言
我已经盯着这个问题好几个星期了，但我一无所获!它不起作用，我知道那么多，但我不知道为什么或出了什么问题。我确实知道开发人员针对我突出显示的行吐出了“错误:预期表达式”，但这实际上只是冰山一角。如果有人
点对点聊天中程序的多个实例之间的通信 - C 语言
我正在编写一个点对点聊天程序。在此程序中，客户端和服务器功能写入一个唯一的文件中。首先我想问一下我程序中的机制是否正确？ I fork() two processes, one for client
计算不以句点结尾的段落，C 语言
基本上我需要找到一种方法来发现段落是否以句点 (.) 结束。此时我已经可以计算给定文本的段落数，但我没有想出任何东西来检查它是否在句点内结束。任何帮助都会帮助我，谢谢 char ch; FI
C 语言 -> 将段落中的单词分开
我的函数 save_words 接收 Armazena 和大小。 Armazena 是一个包含段落的动态数组，size 是数组的大小。在这个函数中，我想将单词放入其他称为单词的动态数组中。当我运行它时
比较两个字符 [C 语言]
我有一个结构 struct Human { char *name; struct location *location; int
C 语言 - 如何确保在读取多个输入文件时保持恒定格式？
我正在尝试缩进以下代码的字符串输出，但由于某种原因，我的变量不断从文件中提取，并且具有不同长度的噪声或空间(我不确定)。这是我的代码: #include #include int main (v
C 语言 - WHILE 循环的工作量超出了预期
我想让用户选择一个选项。所以我声明了一个名为 Choice 的变量，我希望它输入一个只能是 'M' 的 char 、'C'、'O' 或 'P'。这是我的代码: char Choice; printf
使用定义和变量连接数组 - C 语言
我正在寻找一种解决方案，将定义和变量的值连接到数组中。我已经尝试过像这样使用 memcpy 但它不起作用: #define ADDRESS {0x00, 0x00, 0x00, 0x00, 0x0

首页

博学

6Ren·AI

商城

python - 在 Python 中嵌入低性能脚本语言