Python+C 比纯 C 快(稍微)快-6ren

Python+C 比纯 C 快(稍微)快

转载作者：太空狗更新时间：2023-10-29 15:39:54

29

4

我一直在用各种语言和实现实现相同的代码(在 Blackjack 中发牌而不爆牌的方法的数量)。我注意到的一个奇怪之处是，Python 在 C 中调用分区函数的实现实际上比用 C 编写的整个程序快一点。其他语言似乎也是如此(Ada vs Python 调用 Ada，Nim vs Python 调用尼姆)。这对我来说似乎违反直觉 - 知道这怎么可能吗？

所有代码都在我的 GitHub 仓库中:

https://github.com/octonion/puzzles/tree/master/blackjack

这是 C 代码，使用“gcc -O3 outcomes.c”编译。

#include <stdio.h>

int partitions(int cards[10], int subtotal)
{
    //writeln(cards,subtotal);
    int m = 0;
    int total;
    // Hit
    for (int i = 0; i < 10; i++)
    {
        if (cards[i] > 0)
        {
            total = subtotal + i + 1;
            if (total < 21)
            {
                // Stand
                m += 1;
                // Hit again
                cards[i] -= 1;
                m += partitions(cards, total);
                cards[i] += 1;
            }
            else if (total == 21)
            {
                // Stand; hit again is an automatic bust
                m += 1;
            }
        }
    }
    return m;
}

int main(void)
{
    int deck[] =
    { 4, 4, 4, 4, 4, 4, 4, 4, 4, 16 };
    int d = 0;

    for (int i = 0; i < 10; i++)
    {
        // Dealer showing
        deck[i] -= 1;
        int p = 0;
        for (int j = 0; j < 10; j++)
        {
            deck[j] -= 1;
            int n = partitions(deck, j + 1);
            deck[j] += 1;
            p += n;
        }

        printf("Dealer showing %i partitions = %i\n", i, p);
        d += p;
        deck[i] += 1;
    }
    printf("Total partitions = %i\n", d);
    return 0;
}

这是 C 函数，使用“gcc -O3 -fPIC -shared -o libpartitions.so partitions.c”编译。

int partitions(int cards[10], int subtotal)
{
    int m = 0;
    int total;
    // Hit
    for (int i = 0; i < 10; i++)
    {
        if (cards[i] > 0)
        {
            total = subtotal + i + 1;
            if (total < 21)
            {
                cards[i] -= 1;
                // Stand
                m += 1;
                // Hit again
                m += partitions(cards, total);
                cards[i] += 1;
            }
            else if (total == 21)
            {
                // Stand; hit again is an automatic bust
                m += 1;
            }
        }
    }
    return m;
}

这是 C 函数的 Python 包装器:

#!/usr/bin/env python

from ctypes import *
import os

test_lib = cdll.LoadLibrary(os.path.abspath("libpartitions.so"))
test_lib.partitions.argtypes = [POINTER(c_int), c_int]
test_lib.partitions.restype = c_int

deck = ([4]*9)
deck.append(16)

d = 0

for i in xrange(10):
    # Dealer showing
    deck[i] -= 1
    p = 0
    for j in xrange(10):
        deck[j] -= 1
        nums_arr = (c_int*len(deck))(*deck)
        n = test_lib.partitions(nums_arr, c_int(j+1))
        deck[j] += 1
        p += n
    print('Dealer showing ', i,' partitions =',p)
    d += p
    deck[i] += 1

print('Total partitions =',d)

最佳答案

我认为这里的原因是 GCC 如何在 2 种情况下编译函数 partitions。您可以使用 objdump 比较 outcomes 二进制可执行文件和 libpartitions.so 中的 asm 代码以查看差异。

objdump -d -M intel <file name>

构建共享库时，GCC 不知道如何调用 partitions。而在 C 程序中，GCC 确切地知道何时调用 partitions(然而，在这种情况下，会导致更差的性能)。这种上下文差异使得 GCC 以不同的方式进行优化。

您可以尝试不同的编译器来比较结果。我已经检查过 GCC 5.4 和 Clang 6.0。使用 GCC 5.4，Python 脚本运行速度更快，而使用 Clang，C 程序运行速度更快。

关于Python+C 比纯 C 快(稍微)快，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49228106/

29

4

0

文章推荐： python - 如何在 C 中为 Python 设置静态类变量？

文章推荐： android - 在代码中发誓

文章推荐： html - 如何在 Angular2 应用程序中禁用 HTML5 表单验证？

文章推荐： android - 获取android ListView中选中项的数量

c# - 从同一个可视化 C# 项目创建两个(稍微)不同的可执行文件
我有一个可视化 C# 项目，我想从中编译两个可执行文件:Full.exe 和 Limited.exe。 Limited.exe 只是隐藏了几个 UI 控件。我正在考虑添加另一对解决方案配置(Debu
ios - (稍微)使用 objective-c 的高级线性代数
我和我的伙伴正计划将我们拥有的 python 应用程序移植到 iOS。我们使用 numpy 的 SVD、范数和 pinv功能，我不知道如何转换为 iOS。我查看了 Accelerate文档，没有看到任
java - Random.nextInt(int) [稍微] 有偏差
即，它永远不会使用某些特定的 upperBound 参数连续生成超过 16 个偶数: Random random = new Random(); int c = 0; int max = 17; in
c++ - 将 double 值舍入为(稍微)较低精度的好方法是什么？
我的问题是我必须使用第三方函数/算法，它采用 double 数组作为输入，但显然对输入数据中的非常小的变化很敏感.但是对于我的应用程序，我必须为(几乎)相同的输入获得相同的结果!特别是我有两个测试输入
c++ - "mouse_event"函数 - 将光标发送到(稍微)错误的坐标
mouse_event 函数将光标发送到稍有错误的坐标(偏离 1-20 像素)。它“关闭”的程度取决于我不太清楚的模式。这是我的代码 int x, y; int repeats = 1000; in
math - 为什么简化的数学方程比在 Julia-Lang 中有更多运算的等价方程运行得(稍微)慢？
在 C++ 类(class)中，我学到了避免重复计算、使用更多加法而不是更多乘法、避免幂等技巧来提高性能。然而，当我尝试让他们用 Julia-Lang 优化代码时，我对相反的结果感到惊讶。例如，这里
php - 在这种(稍微)复杂的 MySQL 数据库情况下，如何删除重复行？
好的。请耐心听我说，我不擅长解释事情。我有一个通过网站上的表格收集的联系信息数据库。显然，人们不小心(或故意，但修复是一个不同的问题)多次按下“提交”，因此该数据库中有很多重复的行。因此，tabl
Python+C 比纯 C 快(稍微)快
我一直在用各种语言和实现实现相同的代码(在 Blackjack 中发牌而不爆牌的方法的数量)。我注意到的一个奇怪之处是，Python 在 C 中调用分区函数的实现实际上比用 C 编写的整个程序快一点。
python - 使用 Pandas 的具有(稍微)不稳定时区的 Python 日期时间的字符串格式
我在使用 Pandas 解析数据的时间戳时遇到问题。我尝试解析的日期时间格式示例类似于 2012-05-02 01:00:00-05:00。从 Pandas 文档中，我被驱动到相关的 Python
c++ - 尝试(稍微)概括 C++ 模板。关联容器 key :Value Inversion
下面函数模板的目标是取任意unordered_map并产生一个新的unordered_map与 key_type和 mapped_type倒。下面的函数适用于 std::unorderd_map .我

首页

博学

6Ren·AI

商城

Python+C 比纯 C 快(稍微)快