gpt4 book ai didi

scala - 如何向csv表添加标题scala Spark

转载 作者:行者123 更新时间:2023-12-02 23:27:49 24 4
gpt4 key购买 nike

我正在尝试从 csv 文件中的表读取数据。它没有 header ,因此当我尝试使用 Spark SQL 查询表时,所有结果均为空。

我尝试创建一个架构结构,虽然当我执行 printschema() 时它会显示,但当我尝试( select * from tableName )时它不会显示工作,所有值均为空。我还尝试了 StructType().add( colName ) 而不是 StructField ,并产生了相同的结果。

        val schemaStruct1 = StructType(
StructField( "AgreementVersionID", IntegerType, true )::
StructField( "ProgramID", IntegerType, true )::
StructField( "AgreementID", IntegerType, true )::
StructField( "AgreementVersionNumber", IntegerType, true )::
StructField( "AgreementStatusID", IntegerType, true )::
StructField( "AgreementEffectiveDate", DateType, true )::
StructField( "AgreementEffectiveDateDay", IntegerType, true )::
StructField( "AgreementEndDate", DateType, true )::
StructField( "AgreementEndDateDay", IntegerType, true )::
StructField( "MasterAgreementNumber", IntegerType, true )::
StructField( "MasterAgreementEffectiveDate", DateType, true )::
StructField( "MasterAgreementEffectiveDateDay", IntegerType, true )::
StructField( "MasterAgreementEndDate", DateType, true )::
StructField( "MasterAgreementEndDateDay", IntegerType, true )::
StructField( "SalesContactName", StringType, true )::
StructField( "RevenueSubID", IntegerType, true )::
StructField( "LicenseAgreementContractTypeID", IntegerType, true )::Nil
)

val df1 = session.read
.option( "header", true )
.option( "delimiter", "," )
.schema( schemaStruct1 )
.csv( LicenseAgrmtMaster )
df1.printSchema()
df1.createOrReplaceTempView( "LicenseAgrmtMaster" )

Printing this schema gives me this schema which is correct

root
|-- AgreementVersionID: integer (nullable = true)
|-- ProgramID: integer (nullable = true)
|-- AgreementID: integer (nullable = true)
|-- AgreementVersionNumber: integer (nullable = true)
|-- AgreementStatusID: integer (nullable = true)
|-- AgreementEffectiveDate: date (nullable = true)
|-- AgreementEffectiveDateDay: integer (nullable = true)
|-- AgreementEndDate: date (nullable = true)
|-- AgreementEndDateDay: integer (nullable = true)
|-- MasterAgreementNumber: integer (nullable = true)
|-- MasterAgreementEffectiveDate: date (nullable = true)
|-- MasterAgreementEffectiveDateDay: integer (nullable = true)
|-- MasterAgreementEndDate: date (nullable = true)
|-- MasterAgreementEndDateDay: integer (nullable = true)
|-- SalesContactName: string (nullable = true)
|-- RevenueSubID: integer (nullable = true)
|-- LicenseAgreementContractTypeID: integer (nullable = true)

这是正确的,但是尝试查询这会给我一个仅产生空值的表,即使该表未填充空值。我需要能够读取此表才能连接到另一个表以完成存储过程

最佳答案

我建议按照以下步骤操作,然后您可以根据需要更改代码

val df = session.read.option( "delimiter", "," ).csv("<Path of your file/dir>")
val colum_names = Seq("name","id")// this is example define exact number of columns
val dfWithHeader = df.toDF(colum_names:_*)
// now you have header here and data should be also here check the type or you can cast

关于scala - 如何向csv表添加标题scala Spark,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57281285/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com