I am attempting to write a Spark DataFrame to an Oracle table via JDBC. I am able to successfully connect and query the database but when I go to create a new table like this:
我正在尝试通过JDBC将Spark DataFrame写入Oracle表。我能够成功地连接和查询数据库,但当我创建一个新表时,如下所示:
df.write.jdbc('jdbc:oracle:thin:@host:port/service', create_table,
mode='overwrite',
properties={'user': 'user', 'password': 'password']})
I get the error message java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor
我收到错误消息java.sql.SQLException:无效的列类型:没有为oracle.jdbc.driver.T4CRowidAccessor类实现getLong
I suspect this has something to do with column ROW_ID that is df.dtypes
bigint
. The ROW_IDs look something like the table below, which doesn't seem to agree with the infered datatype.
我怀疑这与列ROW_ID有关,该列是df.dtype Bigint。ROW_ID类似于下表,这似乎与推断的数据类型不一致。
ROW_ID |
AABBVMAGRAAAJfsAAA |
AABBVMAGRAAAJftAAA |
AABBVMAGRAAAJfyAAB |
AABBVMAGRAAAJfvAAB |
AABBVMAGRAAAJfwAAB |
AABBVMAGRAAAJf3AAI |
EDIT:
编辑:
I tried casting the datatype from bigint
to string
using:
我尝试使用以下命令将数据类型从Bigint转换为字符串:
from pyspark.sql.functions import col
from pyspark.sql.types import StringType
correct_dtypes = df.withColumn('ROW_ID', col('ROW_ID').cast(StringType()))
correct_dtypes.write.jdbc('jdbc:oracle:thin:@host:port/service', create_table,
mode='overwrite',
properties={'user': 'user', 'password': 'password'})
But I am still getting the same error.
但我仍然收到相同的错误。
更多回答
优秀答案推荐
One possible solution could be using createTableColumnTypes
option during saving and cast the troublemaking bigint column to varchar2 on oracle dbs side:
一种可能的解决方案是在保存过程中使用createTableColumnTypes选项,并将制造麻烦的Bigint列转换为Oracle DBS端的varchar2:
(correct_dtypes.write.
.option("createTableColumnTypes", "ROW_ID VARCHAR2(18)")
.jdbc('jdbc:oracle:thin:@host:port/service',
create_table, mode='overwrite',
properties={'user': 'user',
'password': 'password'}))
the answer to solve that is the format of the ROWID column, which is not varchar, you must do it via this option for pyspark
解决这个问题的答案是ROWID列的格式,它不是varchar,您必须通过这个选项来实现它
df = sc.read \
.format("jdbc") \
.option('driver', 'clase') \
.option('url', 'url_jdbc') \
.option('customSchema', 'ROWID VARCHAR(256), ORA_ROWSCN BIGINT')
更多回答
我是一名优秀的程序员,十分优秀!