gpt4 book ai didi

scala - 在数据帧上使用 sql 时无法解析给定的输入列

转载 作者:行者123 更新时间:2023-12-04 02:55:45 24 4
gpt4 key购买 nike

  • 平台:IntelliJ 版 2018.2.4(社区版)
  • SDK:1.8.0_144
  • 操作系统:Windows 7

  • 作为一名 future 的毕业生,我正在执行我的第一个大数据任务,我面临一个问题:

    代码
    //Loading my csv file here
    val df = spark.read
    .format("csv")
    .option("header", "true")
    .option("delimiter",";")
    .load("/user/sfrtech/dilan/yesterdaycsv.csv")
    .toDF()


    //Select required columns
    val formatedDf = df.select("`TcRun.ID`", "`Td.Name`", "`TcRun.Startdate`", "`TcRun.EndDate`", "`O.Sim.MsisdnVoice`", "`T.Sim.MsisdnVoice`", "`ErrorCause`")

    //Sql on DF in order to get useful data
    formatedDf.createOrReplaceTempView("yesterday")
    val sqlDF = spark.sql("" +
    " SELECT TcRun.Id, Td.Name, TcRun.Startdate, TcRun.EndDate, SUBSTR(O.Sim.MsisdnVoice,7,14) as MsisdnO, SUBSTR(T.Sim.MsisdnVoice,7,14) as MsisdnT", ErrorCause +
    " FROM yesterday" +
    " WHERE Td.Name like '%RING'" +
    " AND MsisdnO is not null" +
    " AND MsisdnT is not null" +
    " AND ErrorCause = 'NoError'")

    出现错误

    Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Td.Name' given input columns: [TcRun.EndDate, TcRun.Startdate, O.Sim.MsisdnVoice, TcRun.ID, Td.Name, T.Sim.MsisdnVoice, ErrorCause]; line 1 pos 177;



    我猜问题来自我的列名称包含“。”但我不知道如何解决这个问题,即使我使用反引号

    解决方案
    val newColumns = Seq("id", "name", "startDate", "endDate", "msisdnO", "msisdnT", "error")
    val dfRenamed = df.toDF(newColumns: _*)

    dfRenamed.printSchema
    // root
    // |-- id: string (nullable = false)
    // |-- name: string (nullable = false)
    // |-- startDate: string (nullable = false)
    // |-- endDate: string(nullable = false)
    // |-- msisdnO: string (nullable = false)
    // |-- msisdnT: string (nullable = false)
    // |-- error: string (nullable = false)

    最佳答案

    // Define column names of csv without "."
    val schema = StructType(Array(
    StructField("id", StringType, true),
    StructField("name", StringType, true),
    // etc. etc. )

    // Load csv file without headers and specify your schema
    val df = spark.read
    .format("csv")
    .option("header", "false")
    .option("delimiter",";")
    .schema(schema)
    .load("/user/sfrtech/dilan/yesterdaycsv.csv")
    .toDF()

    然后根据需要选择您的列
    df
    .select ($"id", $"name", /*etc etc*/)

    关于scala - 在数据帧上使用 sql 时无法解析给定的输入列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53173963/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com