gpt4 book ai didi

python - IBM Bluemix Spark : Supplying python dependencies to spark-submit. sh

转载 作者:行者123 更新时间:2023-11-28 18:21:49 24 4
gpt4 key购买 nike

我在 IBM Bluemix PySpark 中使用 Cloudant Python API应用程序。

我如何提供依赖包来 spark submit ? spark-submit.shpy-files 选项只需要 py、zip 或 egg 文件,我的包在 tar 中。 gzwhl 格式。

这是我尝试使用的 Cloudant Python 客户端库的链接 - https://pypi.python.org/pypi/cloudant

文章How to install dependencies for python讨论相同的主题,但我想查看解决方案中提到的 requirements.txt、Procfile 和 manifest.yml 文件的示例。

最佳答案

您应该能够从您的 python 脚本中以编程方式使用 pip,例如

import pip
pip.main(['install', '--user', 'cloudant'])

这对我有用:

helloSpark.py

import sys
from pyspark import SparkContext

import pip
pip.main(['install', '--user', 'cloudant'])

from cloudant.client import Cloudant
client = Cloudant('username', 'password', account='account', connect=True)

# do some spark processing
def computeStatsForCollection(sc,countPerPartitions=100000,partitions=5):
totalNumber = min( countPerPartitions * partitions, sys.maxsize)
rdd = sc.parallelize( range(totalNumber),partitions)
return (rdd.mean(), rdd.variance())

if __name__ == "__main__":
sc = SparkContext(appName="Hello Spark")
print("Hello Spark Demo. Compute the mean and variance of a collection")
stats = computeStatsForCollection(sc);
print(">>> Results: ")
print(">>>>>>>Mean: " + str(stats[0]));
print(">>>>>>>Variance: " + str(stats[1]));
sc.stop()

运行.sh

./spark-submit.sh --vcap ./vcap.json --deploy-mode cluster \
--master https://169.54.219.20:8443 \
--conf spark.service.spark_version=1.6
helloSpark.py

运行后的标准输出:

$ cat stdout_1498114277669877424 
no extra config
load default config from : /usr/local/src/spark160master/spark/profile/batch/
Requirement already satisfied: cloudant in /gpfs/global_fs01/sym_shared/YPProdSpark/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages
Requirement already satisfied: requests<3.0.0,>=2.7.0 in /usr/local/src/bluemix_jupyter_bundle.v47/notebook/lib/python2.7/site-packages (from cloudant)
Traceback (most recent call last):
File "/tmp/spark-160-ego-master/work/spark-driver-380d8ae7-4ddc-452e-bb29-1665375a348c/helloSpark.py", line 8, in <module>
client = Cloudant('username', 'password', account='account', connect=True)
File "/gpfs/fs01/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages/cloudant/client.py", line 443, in __init__
self.connect()
File "/gpfs/fs01/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages/cloudant/client.py", line 114, in connect
self.session_login(self._user, self._auth_token)
File "/gpfs/fs01/user/s9c8-cbcae60bfa1d3e-39ca506ba762/.local/lib/python2.7/site-packages/cloudant/client.py", line 172, in session_login
resp.raise_for_status()
File "/usr/local/src/bluemix_jupyter_bundle.v47/notebook/lib/python2.7/site-packages/requests/models.py", line 840, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://account.cloudant.com/_session

不幸的是,我第一次运行通知它已安装 Cloudant 的脚本时没有保存输出。但在这里您可以看到 Cloudant 库可用,并尝试使用无效凭证连接到集群,因此 Cloudant 返回 401 错误。

您可能不想在每次运行脚本时都尝试 pip 安装,因此您可以试试这个:

try:
import cloudant
except:
import pip
pip.main(['install', '--user', 'cloudant'])

这将尝试加载 Cloudant 库。如果加载它时出错(例如,因为它尚未安装),它将使用 pip 安装。

关于python - IBM Bluemix Spark : Supplying python dependencies to spark-submit. sh,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44688434/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com