- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我已经在本地安装了 Spark 和组件,并且能够在 Jupyter、iPython 和通过 spark-submit 中执行 PySpark 代码 - 但是收到以下警告:
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/27 07:54:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
.py 文件执行但我应该担心这些警告吗?不想开始编写一些代码,后来发现它不会执行。仅供引用,在本地安装了 PySpark。这是代码:
test.txt
:
This is a test file
This is the second line - TEST
This is the third line
this IS THE fourth LINE - tEsT
test.py
:
import pyspark
sc = pyspark.SparkContext.getOrCreate()
# sc = pyspark.SparkContext(master='local[*]') # or 'local[2]' ?
lines = sc.textFile("test.txt")
llist = lines.collect()
for line in llist:
print(line)
print("SparkContext version:\t", sc.version) # return SparkContext version
print("python version:\t", sc.pythonVer) # return python version
print("master URL:\t", sc.master) # master URL to connect to
print("path where spark is installed on worker nodes:\t", sc.sparkHome) # path where spark is installed on worker nodes
print("name of spark user running SparkContext:\t", sc.sparkUser()) # name of spark user running SparkContext
路径:
export SPARK_HOME=/Users/ayubk/spark-3.0.1-bin-hadoop3.2
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3
bash终端:
$ spark-3.0.1-bin-hadoop3.2/bin/spark-submit test.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/27 08:00:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/12/27 08:00:01 INFO SparkContext: Running Spark version 3.0.1
20/12/27 08:00:01 INFO ResourceUtils: ==============================================================
20/12/27 08:00:01 INFO ResourceUtils: Resources for spark.driver:
20/12/27 08:00:01 INFO ResourceUtils: ==============================================================
20/12/27 08:00:01 INFO SparkContext: Submitted application: test.py
20/12/27 08:00:01 INFO SecurityManager: Changing view acls to: ayubk
20/12/27 08:00:01 INFO SecurityManager: Changing modify acls to: ayubk
20/12/27 08:00:01 INFO SecurityManager: Changing view acls groups to:
20/12/27 08:00:01 INFO SecurityManager: Changing modify acls groups to:
20/12/27 08:00:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ayubk); groups with view permissions: Set(); users with modify permissions: Set(ayubk); groups with modify permissions: Set()
20/12/27 08:00:02 INFO Utils: Successfully started service 'sparkDriver' on port 51254.
20/12/27 08:00:02 INFO SparkEnv: Registering MapOutputTracker
20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMaster
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/12/27 08:00:02 INFO DiskBlockManager: Created local directory at /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/blockmgr-a99e3df1-6d15-4158-8e09-568910c2b045
20/12/27 08:00:02 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
20/12/27 08:00:02 INFO SparkEnv: Registering OutputCommitCoordinator
20/12/27 08:00:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/12/27 08:00:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.101:4040
20/12/27 08:00:02 INFO Executor: Starting executor ID driver on host 192.168.1.101
20/12/27 08:00:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51255.
20/12/27 08:00:02 INFO NettyBlockTransferService: Server created on 192.168.1.101:51255
20/12/27 08:00:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/12/27 08:00:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:51255 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 175.8 KiB, free 434.2 MiB)
20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 27.1 KiB, free 434.2 MiB)
20/12/27 08:00:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:51255 (size: 27.1 KiB, free: 434.4 MiB)
20/12/27 08:00:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0
20/12/27 08:00:04 INFO FileInputFormat: Total input files to process : 1
20/12/27 08:00:04 INFO SparkContext: Starting job: collect at /Users/ayubk/test.py:9
20/12/27 08:00:04 INFO DAGScheduler: Got job 0 (collect at /Users/ayubk/test.py:9) with 2 output partitions
20/12/27 08:00:04 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /Users/ayubk/test.py:9)
20/12/27 08:00:04 INFO DAGScheduler: Parents of final stage: List()
20/12/27 08:00:04 INFO DAGScheduler: Missing parents: List()
20/12/27 08:00:04 INFO DAGScheduler: Submitting ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0), which has no missing parents
20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KiB, free 434.2 MiB)
20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KiB, free 434.2 MiB)
20/12/27 08:00:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.101:51255 (size: 2.3 KiB, free: 434.4 MiB)
20/12/27 08:00:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1223
20/12/27 08:00:04 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1))
20/12/27 08:00:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/12/27 08:00:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.101, executor driver, partition 0, PROCESS_LOCAL, 7367 bytes)
20/12/27 08:00:04 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.101, executor driver, partition 1, PROCESS_LOCAL, 7367 bytes)
20/12/27 08:00:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/12/27 08:00:04 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:52+52
20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:0+52
20/12/27 08:00:04 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 956 bytes result sent to driver
20/12/27 08:00:04 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1003 bytes result sent to driver
20/12/27 08:00:04 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 156 ms on 192.168.1.101 (executor driver) (1/2)
20/12/27 08:00:04 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 142 ms on 192.168.1.101 (executor driver) (2/2)
20/12/27 08:00:04 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/12/27 08:00:04 INFO DAGScheduler: ResultStage 0 (collect at /Users/ayubk/test.py:9) finished in 0.241 s
20/12/27 08:00:04 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
20/12/27 08:00:04 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
20/12/27 08:00:04 INFO DAGScheduler: Job 0 finished: collect at /Users/ayubk/test.py:9, took 0.296115 s
This is a test file
This is the second line - TEST
This is the third line
this IS THE fourth LINE - tEsT
SparkContext version: 3.0.1
python version: 3.7
master URL: local[*]
path where spark is installed on worker nodes: None
name of spark user running SparkContext: ayubk
20/12/27 08:00:04 INFO SparkContext: Invoking stop() from shutdown hook
20/12/27 08:00:04 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
20/12/27 08:00:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/12/27 08:00:04 INFO MemoryStore: MemoryStore cleared
20/12/27 08:00:04 INFO BlockManager: BlockManager stopped
20/12/27 08:00:04 INFO BlockManagerMaster: BlockManagerMaster stopped
20/12/27 08:00:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/12/27 08:00:04 INFO SparkContext: Successfully stopped SparkContext
20/12/27 08:00:04 INFO ShutdownHookManager: Shutdown hook called
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-76d186fb-cf42-4898-92db-050a73f9fcb7
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c/pyspark-ee1fe6ab-a27f-4be6-b8d8-06594704da12
编辑:
brew update
brew tap adoptopenjdk/openjdk
brew search jdk
brew install --cask adoptopenjdk8
虽然在输入此
java -version
时,我得到这个:
openjdk version "13" 2019-09-17
OpenJDK Runtime Environment (build 13+33)
OpenJDK 64-Bit Server VM (build 13+33, mixed mode, sharing)
最佳答案
安装 Java 8 而不是 Java 11,众所周知,Java 11 会在 Spark 中发出此类警告。
关于python - PySpark "illegal reflective access operation"在终端中执行时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65463877/
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库,但没有成功。 我猜它只是通过 knn 聚类
我有一个扁平数字列表,这些数字逻辑上以 3 为一组,其中每个三元组是 (number, __ignored, flag[0 or 1]),例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。 如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
这听起来像是谜语或笑话,但实际上我还没有找到这个问题的答案。 问题到底是什么? 我想运行 2 个脚本。在第一个脚本中,我调用另一个脚本,但我希望它们继续并行,而不是在两个单独的线程中。主要是我不希望第
我有一个带有 python 2.5.5 的软件。我想发送一个命令,该命令将在 python 2.7.5 中启动一个脚本,然后继续执行该脚本。 我试过用 #!python2.7.5 和http://re
我在 python 命令行(使用 python 2.7)中,并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹,使用: os.chdir("
剧透:部分解决(见最后)。 以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
假设我有以下列表,对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
所以我试图在选择某个单选按钮时更改此框架的背景。 我的框架位于一个类中,并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
我正在尝试将字符串与 python 中的正则表达式进行比较,如下所示, #!/usr/bin/env python3 import re str1 = "Expecting property name
考虑以下原型(prototype) Boost.Python 模块,该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
如何编写一个程序来“识别函数调用的行号?” python 检查模块提供了定位行号的选项,但是, def di(): return inspect.currentframe().f_back.f_l
我已经使用 macports 安装了 Python 2.7,并且由于我的 $PATH 变量,这就是我输入 $ python 时得到的变量。然而,virtualenv 默认使用 Python 2.6,除
我只想问如何加快 python 上的 re.search 速度。 我有一个很长的字符串行,长度为 176861(即带有一些符号的字母数字字符),我使用此函数测试了该行以进行研究: def getExe
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。 告
我想用 Python 将两个列表组合成一个列表,方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
学习 Python,我正在尝试制作一个没有任何第 3 方库的网络抓取工具,这样过程对我来说并没有简化,而且我知道我在做什么。我浏览了一些在线资源,但所有这些都让我对某些事情感到困惑。 html 看起来
我是一名优秀的程序员,十分优秀!