gpt4 book ai didi

python - 在Cloudera VM中阅读教程CSV文件时的异常(exception)

转载 作者:行者123 更新时间:2023-12-02 21:22:18 25 4
gpt4 key购买 nike

我正在尝试编写Cloudera虚拟机随附的Spark教程。但是,即使我使用正确的行尾编码,也无法执行脚本,因为会出现大量错误。
本教程是Coursera Introduction to Big Data Analytics类(class)的一部分。分配can be found here

这就是我所做的。安装IPython Shell(如果尚未完成):

sudo easy_install ipython==1.2.1

打开/启动shell(使用1.2.0或1.4.0):
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.2.0

将行尾设置为Windows样式。这是因为该文件是Windows编码的,因此在此过程中已说过。如果不这样做,还会收到其他错误。
sc._jsc.hadoopConfiguration().set('textinputformat.record.delimiter','\r\n')

尝试加载CSV文件:
yelp_df = sqlCtx.load(source='com.databricks.spark.csv',header = 'true',inferSchema = 'true',path = 'file:///usr/lib/hue/apps/search/examples/collections/solr_configs_yelp_demo/index_data.csv')

但是会得到很长的错误列表,它的开始如下:
Py4JJavaError: An error occurred while calling o23.load.: java.lang.RuntimeException: 
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:472)

完整的错误消息 can be seen here。这是/etc/hive/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->

<!-- Hive Execution Parameters -->

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>

<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>

<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>

任何帮助或想法如何解决?我猜这是一个很常见的错误。但是我找不到任何解决方案。

还有一件事:是否有办法将这么长的错误消息转储到单独的日志文件中?

最佳答案

讨论摘要:
执行以下命令可以解决此问题:

sudo cp /etc/hive/conf.dist/hive-site.xml /usr/lib/spark/conf/

有关更多信息,请参见 https://www.coursera.org/learn/bigdata-analytics/supplement/tyH3p/setup-pyspark-for-dataframes

关于python - 在Cloudera VM中阅读教程CSV文件时的异常(exception),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36966550/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com