gpt4 book ai didi

mysql - Spark - 通过 Zeppelin EMR 连接到 mysql

转载 作者:可可西里 更新时间:2023-11-01 08:38:53 26 4
gpt4 key购买 nike

我正在尝试从 AWS EMR - Zeppelin 笔记本连接到 MySQL 实例。将 mysql 连接器加载到此位置 - /usr/lib/spark/jars/mysql-connector-java-5.0.4-bin.jar 。并将其添加为齐柏林飞艇解释器中的工件。

启动驱动程序,

Class.forName("com.mysql.jdbc.Driver")

res77: Class[_] = class com.mysql.jdbc.Driver

像这里一样使用 Scala 代码,

试验 1,

val jdbcDF = spark.read.format("jdbc").options(
Map("url" -> "jdbc:mysql://our-host.readm.co.nz:3306/testdb?user=testuser&password=****",
"dbtable" -> "testdb.tablename",
"driver" -> "com.mysql.jdbc.Driver",
"partitionColumn" -> "id", "lowerBound" -> "1", "upperBound" -> "41514638", "numPartitions" -> "20"
)).load()

得到这个错误,

com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.net.SocketException
MESSAGE: java.net.ConnectException: Connection timed out (Connection timed out)
STACKTRACE:
java.net.SocketException: java.net.ConnectException: Connection timed out (Connection timed out)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:156)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:276)
at com.mysql.jdbc.Connection.createNewIO(Connection.java:2666)
at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at <init>(<console>:50)
at <init>(<console>:55)
at <init>(<console>:57)
at <init>(<console>:59)
at <init>(<console>:61)
at <init>(<console>:63)
at <init>(<console>:65)
at <init>(<console>:67)
at <init>(<console>:69)
at <init>(<console>:71)
at <init>(<console>:73)
at <init>(<console>:75)
at <init>(<console>:77)
at <init>(<console>:79)
at <init>(<console>:81)
at <init>(<console>:83)
at <init>(<console>:85)
at <init>(<console>:87)
at <init>(<console>:89)
at <init>(<console>:91)
at <init>(<console>:93)
at .<init>(<console>:97)
at .<clinit>(<console>)
at .$print$lzycompute(<console>:7)
at .$print(<console>:6)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at sun.reflect.GeneratedMethodAccessor369.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1000)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1205)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1172)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1165)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
** END NESTED EXCEPTION **
Last packet sent to the server was 1 ms ago.
at com.mysql.jdbc.Connection.createNewIO(Connection.java:2741)
at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 58 elided

然后是试验2,

val partitionColumn = "id"
val dbUrl = "jdbc:mysql://our-host.readm.co.nz:3306"

val lowerBound = 1
val upperBound = 41514638

val jdbcDF = spark.read.format("jdbc")
.option("url", dbUrl)
.option("databaseName", "testdb")
.option("dbtable", "tablename")
.option("user", "username")
.option("password", "*****")
.option("driver","com.mysql.jdbc.Driver")
.option("partitionColumn", partitionColumn)
.option("lowerBound", lowerBound)
.option("upperBound", upperBound)
.option("numPartitions", 20)
.load()

得到这个错误,

java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 58 elided

试验 3,

import java.util.Properties

val table = "tablename"
val url = "jdbc:mysql://our-host.readm.co.nz"
val prop = new Properties()
prop.put("driver","com.mysql.jdbc.Driver")
prop.put("user","username")
prop.put("password","****")
prop.put("databaseName","testdb")

val jdbcDF = spark.read.jdbc(url,
table,
prop)

得到这个错误,

java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:72)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
... 58 elided

不确定这里有什么问题,可以帮助我吗?谢谢。

编辑:- 这是我在尝试@geo 给出的答案后得到的结果

com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.net.SocketException
MESSAGE: java.net.ConnectException: Connection timed out (Connection timed out)
STACKTRACE:
java.net.SocketException: java.net.ConnectException: Connection timed out (Connection timed out)
at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:156)
at com.mysql.jdbc.MysqlIO.<init>(MysqlIO.java:276)
at com.mysql.jdbc.Connection.createNewIO(Connection.java:2666)
at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at <init>(<console>:40)
at <init>(<console>:45)
at <init>(<console>:47)
at <init>(<console>:49)
at <init>(<console>:51)
at <init>(<console>:53)
at <init>(<console>:55)
at <init>(<console>:57)
at <init>(<console>:59)
at <init>(<console>:61)
at <init>(<console>:63)
at <init>(<console>:65)
at <init>(<console>:67)
at .<init>(<console>:71)
at .<clinit>(<console>)
at .$print$lzycompute(<console>:7)
at .$print(<console>:6)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at sun.reflect.GeneratedMethodAccessor340.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1000)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:1205)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1172)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:1165)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:498)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
** END NESTED EXCEPTION **
Last packet sent to the server was 7 ms ago.
at com.mysql.jdbc.Connection.createNewIO(Connection.java:2741)
at com.mysql.jdbc.Connection.<init>(Connection.java:1531)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:266)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
... 50 elided

最佳答案

假设 JDBC 解释器在 Zeppelin 中没有任何问题,这对我有用:

// DB connection
val options = Map(
"url" -> "jdbc:mysql://localhost:3306/dbname",
"dbtable" -> "dbtable",
"user" -> "user",
"password" -> "password"
)

val data = spark
.read
.format("jdbc")
.options(options)
.load

不言而喻,应该有一个 mysql 进程运行在 localhost:3306(或您托管数据库的任何地方),并且防火墙/iptables 允许您进出访问/从那个过程。

关于mysql - Spark - 通过 Zeppelin EMR 连接到 mysql,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50574815/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com