gpt4 book ai didi

python - 使用 python 访问 HDInsight Hive

转载 作者:可可西里 更新时间:2023-11-01 14:56:03 28 4
gpt4 key购买 nike

我们有一个 HDInsight 群集,其中一些表位于 HIVE 中。我想从客户端计算机(Azure 外部)的 Python 3.6 查询这些表。

我尝试过使用 PyHivepyhs2 以及 impyla,但我在使用它们时遇到了各种问题。

有人有从 Python 访问 HDInsight HIVE 的工作示例吗?

我对此经验很少,并且不知道如何配置PyHive(这似乎是最有前途的),尤其是在授权方面。

使用impyla:

from impala.dbapi import connect
conn = connect(host='redacted.azurehdinsight.net',port=443)
cursor = conn.cursor()
cursor.execute('SELECT * FROM cs_test LIMIT 100')
print(cursor.description) # prints the result set's schema
results = cursor.fetchall()

这给出:

Traceback (most recent call last):
File "C:/git/ml-notebooks/impyla.py", line 3, in <module>
cursor = conn.cursor()
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 125, in cursor
session = self.service.open_session(user, configuration)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 995, in open_session
resp = self._rpc('OpenSession', req)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 923, in _rpc
response = self._execute(func_name, request)
File "C:\Users\chris\Anaconda3\lib\site-packages\impala\hiveserver2.py", line 954, in _execute
.format(self.retries))
impala.error.HiveServer2Error: Failed after retrying 3 times

使用Pyhive:

from pyhive import hive

conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
#also tried other auth-types, but as i said, i have no clue here

这给出:

Traceback (most recent call last):
File "C:/git/ml-notebooks/PythonToHive.py", line 3, in <module>
conn = hive.connect(host="redacted.azurehdinsight.net",port=443,auth="NOSASL")
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 64, in connect
return Connection(*args, **kwargs)
File "C:\Users\chris\Anaconda3\lib\site-packages\pyhive\hive.py", line 164, in __init__
response = self._client.OpenSession(open_session_req)
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 187, in OpenSession
return self.recv_OpenSession()
File "C:\Users\chris\Anaconda3\lib\site-packages\TCLIService\TCLIService.py", line 199, in recv_OpenSession
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\protocol\TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 60, in readAll
chunk = self.read(sz - have)
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TTransport.py", line 161, in read
self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
File "C:\Users\chris\Anaconda3\lib\site-packages\thrift\transport\TSocket.py", line 117, in read
buff = self.handle.recv(sz)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

最佳答案

根据官方文档Understand and resolve errors received from WebHCat on HDInsight ,如下所述。

What is WebHCat

WebHCat is a REST API for HCatalog, a table, and storage management layer for Hadoop. WebHCat is enabled by default on HDInsight clusters, and is used by various tools to submit jobs, get job status, etc. without logging in to the cluster.

所以解决方法是使用WebHCat在Python中运行Hive QL,请引用Hive document学习和使用它。作为引用,有类似的MSDN thread对此进行了讨论。

希望有帮助。

关于python - 使用 python 访问 HDInsight Hive,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46369427/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com