- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我使用brew安装了spark和hadoop:
brew info hadoop #=> hadoop: stable 3.1.2
brew info apache-spark #=> apache-spark: stable 2.4.4
import pyspark
conf = pyspark.SparkConf()
sc = pyspark.SparkContext('local[4]', conf=conf)
sc._jsc.hadoopConfiguration().set("fs.s3a.awsAccessKeyId", "key here")
sc._jsc.hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", "key here")
sql = pyspark.SQLContext(sc)
df = sql.read.csv('s3a://pilo/fi/data_2014_1.csv')
19/09/17 14:34:52 WARN FileStreamSink: Error while looking for metadata directory.
Traceback (most recent call last):
File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/cyrusghazanfar/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/Users/cyrusghazanfar/Desktop/startup-studio/pilota_project/pilota_ml/ingestion/clients/aws_s3.py", line 51, in <module>
df = sql.read.csv('s3a://pilo/fi/data_2014_1.csv')
File "/Users/cyrusghazanfar/Desktop/startup-studio/pilota_project/pilota_ml/env/lib/python3.6/site-packages/pyspark/sql/readwriter.py", line 476, in csv
return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File "/Users/cyrusghazanfar/Desktop/startup-studio/pilota_project/pilota_ml/env/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/Users/cyrusghazanfar/Desktop/startup-studio/pilota_project/pilota_ml/env/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/Users/cyrusghazanfar/Desktop/startup-studio/pilota_project/pilota_ml/env/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o26.csv.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:547)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:545)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:618)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:835)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 30 more
最佳答案
也许thisisprabin的解决方案here可能有帮助?
将以下内容添加到此文件“hadoop / etc / hadoop / core-site.xml”
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>***</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>***</value>
</property>
sudo cp hadoop/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar hadoop/share/hadoop/common/lib/
sudo cp hadoop/share/hadoop/tools/lib/hadoop-aws-2.7.5.jar hadoop/share/hadoop/common/lib/
关于apache-spark - 从AWS S3读取pyspark文件不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57980317/
今天有小伙伴给我留言问到,try{...}catch(){...}是什么意思?它用来干什么? 简单的说 他们是用来捕获异常的 下面我们通过一个例子来详细讲解下
我正在努力提高网站的可访问性,但我不知道如何在页脚中标记社交媒体链接列表。这些链接指向我在 facecook、twitter 等上的帐户。我不想用 role="navigation" 标记这些链接,因
说现在是 6 点,我有一个 Timer 并在 10 点安排了一个 TimerTask。之后,System DateTime 被其他服务(例如 ntp)调整为 9 点钟。我仍然希望我的 TimerTas
就目前而言,这个问题不适合我们的问答形式。我们希望答案得到事实、引用资料或专业知识的支持,但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开,visit the
我就废话不多说了,大家还是直接看代码吧~ ? 1
Maven系列1 1.什么是Maven? Maven是一个项目管理工具,它包含了一个对象模型。一组标准集合,一个依赖管理系统。和用来运行定义在生命周期阶段中插件目标和逻辑。 核心功能 Mav
我是一名优秀的程序员,十分优秀!