gpt4 book ai didi

hadoop - sqlContext.createDataFrame生成错误

转载 作者:行者123 更新时间:2023-12-02 20:58:13 24 4
gpt4 key购买 nike

我是Spark环境的新手,正在尝试将CS​​V文件导入Spark 2.0.2。我正在Windows 10上使用pyspark。这是到目前为止的代码

    from pyspark.sql.types import *
import csv
projectFile = sc.textFile("bankfull.csv",4)
schema = StructType([StructField("int_field", IntegerType()),StructField("string_field", StringType())])
header = projectFile.first()
projectHeader = projectFile.filter(lambda l: "age" in l)
projectNoHeader = projectFile.subtract(projectHeader)
project_rdd = projectNoHeader.mapPartitions(lambda x: csv.reader(x, delimiter=","))
project_df = sqlContext.createDataFrame(project_rdd,schema)

此时,我收到一条错误消息
An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.applySchemaToPythonRDD(SparkSession.scala:666)
at org.apache.spark.sql.SparkSession.applySchemaToPythonRDD(SparkSession.scala:656)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: ---------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 32 more

请问该如何解决?
谢谢

最佳答案

在Windows spark上运行spark时,请尝试通过创建tmp / hive文件夹(数据库)在C:驱动器上模拟hive数据库。为此,需要将winutil.exe放在您设置的$ HADOOP_HOME的bin文件夹中。您可以从此处下载winutil.exe link。如果仍然遇到问题,请尝试在/ tmp / hive目录上提供完全访问权限,或者以具有权限的管理员身份创建c:/ tmp / hive目录

关于hadoop - sqlContext.createDataFrame生成错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43927591/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com