gpt4 book ai didi

python-3.x - 使用 xgb 和 XGBclassifier 的 CPU 比 GPU 快

转载 作者:行者123 更新时间:2023-12-03 15:46:56 31 4
gpt4 key购买 nike

由于我是初学者,我提前道歉。我正在尝试使用 xgb 和 XGBclassifier 使用 XGBoost 进行 GPU 与 CPU 测试。结果如下:

   passed time with xgb (gpu): 0.390s
passed time with XGBClassifier (gpu): 0.465s
passed time with xgb (cpu): 0.412s
passed time with XGBClassifier (cpu): 0.421s
我想知道为什么 CPU 的性能似乎不比 GPU 好。
这是我的设置:
  • Python 3.6.1
  • 操作系统:Windows 10 64 位
  • GPU:NVIDIA RTX 2070 Super 8gb vram(驱动更新到最新版本)
  • 已安装 CUDA 10.1
  • CPU i7 10700 2.9Ghz
  • 在 Jupyter Notebook 上运行
  • 通过 pip
  • 安装了每晚构建的 xgboost 1.2.0

    ** 还尝试使用从预先构建的二进制轮使用 pip 安装的 xgboost 版本:同样的问题
    这是我正在使用的测试代码(取自 here):
    param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8, 
    'colsample_bytree':0.8, 'eta':0.5, 'min_child_weight':1,
    'tree_method':'gpu_hist'
    }

    num_round = 100

    dtrain = xgb.DMatrix(X_train2, y_train)
    tic = time.time()
    model = xgb.train(param, dtrain, num_round)
    print('passed time with xgb (gpu): %.3fs'%(time.time()-tic))

    xgb_param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'learning_rate':0.5, 'min_child_weight':1,
    'tree_method':'gpu_hist'}
    model = xgb.XGBClassifier(**xgb_param)
    tic = time.time()
    model.fit(X_train2, y_train)
    print('passed time with XGBClassifier (gpu): %.3fs'%(time.time()-tic))

    param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'eta':0.5, 'min_child_weight':1,
    'tree_method':'hist'}
    num_round = 100

    dtrain = xgb.DMatrix(X_train2, y_train)
    tic = time.time()
    model = xgb.train(param, dtrain, num_round)
    print('passed time with xgb (cpu): %.3fs'%(time.time()-tic))

    xgb_param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'learning_rate':0.5, 'min_child_weight':1,
    'tree_method':'hist'}
    model = xgb.XGBClassifier(**xgb_param)
    tic = time.time()
    model.fit(X_train2, y_train)
    print('passed time with XGBClassifier (cpu): %.3fs'%(time.time()-tic))
    我试过结合 Sklearn 网格搜索,看看我是否会在 GPU 上获得更快的速度,但它最终比 CPU 慢得多:
    passed time with XGBClassifier (gpu): 2457.510s
    Best parameter (CV score=0.490):
    {'xgbclass__alpha': 100, 'xgbclass__eta': 0.01, 'xgbclass__gamma': 0.2, 'xgbclass__max_depth': 5, 'xgbclass__n_estimators': 100}


    passed time with XGBClassifier (cpu): 383.662s
    Best parameter (CV score=0.487):
    {'xgbclass__alpha': 100, 'xgbclass__eta': 0.1, 'xgbclass__gamma': 0.2, 'xgbclass__max_depth': 2, 'xgbclass__n_estimators': 20}
    我正在使用具有 75k 观测值的数据集。知道为什么我没有通过使用 GPU 获得加速吗?数据集是否太小而无法从使用 GPU 中获得 yield ?
    任何帮助将非常感激。非常感谢!

    最佳答案

    有趣的问题。正如您所注意到的,在 Github 和官方 xgboost site 上已经注意到了一些这样的例子。 :

  • https://github.com/dmlc/xgboost/issues/2819
  • https://discuss.xgboost.ai/t/no-gpu-usage-when-using-gpu-hist/532

  • 也有网友提出了类似的问题:
  • No speedup using XGBClassifier with GPU support

  • 看着 official xgboost documentation , there is an extensive section on GPU support .
    有几件事需要检查。文档指出:

    Tree construction (training) and prediction can be accelerated withCUDA-capable GPUs.


    1. 您的 GPU CUDA 是否已启用?
    Yes, it is .
    2. 您是否使用了会受 GPU 使用影响的参数?
    请记住,只有某些参数才能从使用 GPU 中受益。那些是:
    是的,你是。 其中大部分都包含在您的超参数集中,这是一件好事。
    {subsample, sampling_method, colsample_bytree, colsample_bylevel, max_bin, gamma, gpu_id, predictor, grow_policy, monotone_constraints, interaction_constraints, single_precision_histogram}
    3. 您是否正在配置参数以使用 GPU 支持?
    如果你看 XGBoost Parameters page ,您可以找到可能有助于改善您的时间的其他领域。例如, updater可以设置为 grow_gpu_hist , 其中(注意,这是没有实际意义的,因为您设置了 tree_method,但用于注释):

    grow_gpu_hist: Grow tree with GPU.


    在参数页面的底部,还有 gpu_hist 的附加参数已启用,特别是 deterministic_histogram (注意,这是没有实际意义的,因为它默认为 True ):

    Build histogram on GPU deterministically. Histogram building is notdeterministic due to the non-associative aspect of floating pointsummation. We employ a pre-rounding routine to mitigate the issue,which may lead to slightly lower accuracy. Set to false to disable it.


    4. 数据
    我用一些数据进行了一些有趣的实验。由于我无权访问您的数据,因此我使用了 sklearn make_classification ,生成数据 in a rather robust way .
    我对您的脚本进行了一些更改,但没有发现任何变化:我更改了 gpu 与 cpu 示例的超参数,我运行了 100 次并取得了平均结果等。对我来说似乎没有什么特别突出的。我记得我曾经用过 XGBoost GPU 与 CPU 的能力来加速一些分析,但是,我正在研究 更大的数据集。
    我稍微编辑了您的脚本以使用这些数据,并开始更改 samples 的数量。和 features在数据集中(通过 n_samplesn_features 参数)观察对运行时的影响。似乎 GPU 将显着改善 的训练时间高维数据 ,但是批量数据带有 许多 sample 没有看到巨大的改善。请参阅下面的我的脚本:
    import xgboost as xgb, numpy, time
    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split

    xgb_gpu = []
    xgbclassifier_gpu = []
    xgb_cpu = []
    xgbclassifier_cpu = []

    n_samples = 75000
    n_features = 500

    for i in range(len(10)):
    n_samples += 10000
    n_features += 300
    # Make my own data since I do not have the data from the SO question
    X_train2, y_train = make_classification(n_samples=n_samples, n_features=n_features*0.9, n_informative=n_features*0.1,
    n_redundant=100, flip_y=0.10, random_state=8)

    # Keep script from OP intact
    param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'eta':0.5, 'min_child_weight':1,
    'tree_method':'gpu_hist', 'gpu_id': 0
    }
    num_round = 100

    dtrain = xgb.DMatrix(X_train2, y_train)
    tic = time.time()
    model = xgb.train(param, dtrain, num_round)
    print('passed time with xgb (gpu): %.3fs'%(time.time()-tic))
    xgb_gpu.append(time.time()-tic)

    xgb_param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'learning_rate':0.5, 'min_child_weight':1,
    'tree_method':'gpu_hist', 'gpu_id':0}
    model = xgb.XGBClassifier(**xgb_param)
    tic = time.time()
    model.fit(X_train2, y_train)
    print('passed time with XGBClassifier (gpu): %.3fs'%(time.time()-tic))
    xgbclassifier_gpu.append(time.time()-tic)

    param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'eta':0.5, 'min_child_weight':1,
    'tree_method':'hist'}
    num_round = 100

    dtrain = xgb.DMatrix(X_train2, y_train)
    tic = time.time()
    model = xgb.train(param, dtrain, num_round)
    print('passed time with xgb (cpu): %.3fs'%(time.time()-tic))
    xgb_cpu.append(time.time()-tic)
    xgb_param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8,
    'colsample_bytree':0.8, 'learning_rate':0.5, 'min_child_weight':1,
    'tree_method':'hist'}
    model = xgb.XGBClassifier(**xgb_param)
    tic = time.time()
    model.fit(X_train2, y_train)
    print('passed time with XGBClassifier (cpu): %.3fs'%(time.time()-tic))
    xgbclassifier_cpu.append(time.time()-tic)

    import pandas as pd
    df = pd.DataFrame({'XGB GPU': xgb_gpu, 'XGBClassifier GPU': xgbclassifier_gpu, 'XGB CPU': xgb_cpu, 'XGBClassifier CPU': xgbclassifier_cpu})
    #df.to_csv('both_results.csv')
    我在相同的数据集上单独和一起更改每个(样本、特征)。见下面的结果:
    | Interval |  XGB GPU | XGBClassifier GPU |  XGB CPU | XGBClassifier CPU |      Metric      |
    |:--------:|:--------:|:-----------------:|:--------:|:-----------------:|:----------------:|
    | 0 | 11.3801 | 12.00785 | 15.20124 | 15.48131 | Changed Features |
    | 1 | 15.67674 | 16.85668 | 20.63819 | 22.12265 | Changed Features |
    | 2 | 18.76029 | 20.39844 | 33.23108 | 32.29926 | Changed Features |
    | 3 | 23.147 | 24.91953 | 47.65588 | 44.76052 | Changed Features |
    | 4 | 27.42542 | 29.48186 | 50.76428 | 55.88155 | Changed Features |
    | 5 | 30.78596 | 33.03594 | 71.4733 | 67.24275 | Changed Features |
    | 6 | 35.03331 | 37.74951 | 77.68997 | 75.61216 | Changed Features |
    | 7 | 39.13849 | 42.17049 | 82.95307 | 85.83364 | Changed Features |
    | 8 | 42.55439 | 45.90751 | 92.33368 | 96.72809 | Changed Features |
    | 9 | 46.89023 | 50.57919 | 105.8298 | 107.3893 | Changed Features |
    | 0 | 7.013227 | 7.303488 | 6.998254 | 9.733574 | No Changes |
    | 1 | 6.757523 | 7.302388 | 5.714839 | 6.805287 | No Changes |
    | 2 | 6.753428 | 7.291906 | 5.899611 | 6.603533 | No Changes |
    | 3 | 6.749848 | 7.293555 | 6.005773 | 6.486256 | No Changes |
    | 4 | 6.755352 | 7.297607 | 5.982163 | 8.280619 | No Changes |
    | 5 | 6.756498 | 7.335412 | 6.321188 | 7.900422 | No Changes |
    | 6 | 6.792402 | 7.332112 | 6.17904 | 6.443676 | No Changes |
    | 7 | 6.786584 | 7.311666 | 7.093638 | 7.811417 | No Changes |
    | 8 | 6.7851 | 7.30604 | 5.574762 | 6.045969 | No Changes |
    | 9 | 6.789152 | 7.309363 | 5.751018 | 6.213471 | No Changes |
    | 0 | 7.696765 | 8.03615 | 6.175457 | 6.764809 | Changed Samples |
    | 1 | 7.914885 | 8.646722 | 6.997217 | 7.598789 | Changed Samples |
    | 2 | 8.489555 | 9.2526 | 6.899783 | 7.202334 | Changed Samples |
    | 3 | 9.197605 | 10.02934 | 7.511708 | 7.724675 | Changed Samples |
    | 4 | 9.73642 | 10.64056 | 7.918493 | 8.982463 | Changed Samples |
    | 5 | 10.34522 | 11.31103 | 8.524865 | 9.403711 | Changed Samples |
    | 6 | 10.94025 | 11.98357 | 8.697257 | 9.49277 | Changed Samples |
    | 7 | 11.80717 | 12.93195 | 8.734307 | 10.79595 | Changed Samples |
    | 8 | 12.18282 | 13.38646 | 9.175231 | 10.33532 | Changed Samples |
    | 9 | 13.05499 | 14.33106 | 11.04398 | 10.50722 | Changed Samples |
    | 0 | 12.43683 | 13.19787 | 12.80741 | 13.86206 | Changed Both |
    | 1 | 18.59139 | 20.01569 | 25.61141 | 35.37391 | Changed Both |
    | 2 | 24.37475 | 26.44214 | 40.86238 | 42.79259 | Changed Both |
    | 3 | 31.96762 | 34.75215 | 68.869 | 59.97797 | Changed Both |
    | 4 | 41.26578 | 44.70537 | 83.84672 | 94.62811 | Changed Both |
    | 5 | 49.82583 | 54.06252 | 109.197 | 108.0314 | Changed Both |
    | 6 | 59.36528 | 64.60577 | 131.1234 | 140.6352 | Changed Both |
    | 7 | 71.44678 | 77.71752 | 156.1914 | 161.4897 | Changed Both |
    | 8 | 81.79306 | 90.56132 | 196.0033 | 193.4111 | Changed Both |
    | 9 | 94.71505 | 104.8044 | 215.0758 | 224.6175 | Changed Both |
    无变化
    enter image description here
    线性增加特征计数
    enter image description here
    线性增加样本
    enter image description here
    线性增加样本 + 特征
    enter image description here
    随着我开始研究更多;这是有道理的。 众所周知,GPU 可以很好地处理高维数据,如果您的数据是高维数据,您会看到训练时间的改善是有道理的 .请参阅以下示例:
  • https://projecteuclid.org/download/pdfview_1/euclid.ss/1294167962
  • Faster Kmeans Clustering on High-dimensional Data with GPU Support
  • https://link.springer.com/article/10.1007/s11063-014-9383-4

  • 虽然我们不能确定无法访问您的数据,但当您的数据支持 GPU 时,GPU 的硬件功能似乎可以显着提高性能,而且鉴于您的数据的大小和形状,情况似乎并非如此有。

    关于python-3.x - 使用 xgb 和 XGBclassifier 的 CPU 比 GPU 快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63442697/

    31 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com