gpt4 book ai didi

python - Pyspark 错误 :- dataType should be an instance of

转载 作者:行者123 更新时间:2023-12-03 19:45:06 26 4
gpt4 key购买 nike

我需要从 pipelinedRDD 中提取一些数据,但是在将其转换为 Dataframe 时,出现以下错误:

Traceback (most recent call last):

File "/home/karan/Desktop/meds.py", line 42, in <module>

relevantToSymEntered(newrdd)

File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered

mat = spark.createDataFrame(self,StructType([StructField("Prescribed

medicine",StringType), StructField(["Disease","ID","Symptoms

Recorded","Severeness"],ArrayType)]))

File "/home/karan/Downloads/spark-2.4.2-bin-

hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__

"dataType %s should be an instance of %s" % (dataType, DataType)

AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an
instance of <class 'pyspark.sql.types.DataType'>

1. 我的错误是不同类型的,它是 TypeError 而我遇到了 AssertionError 的问题。
  • 我的问题与数据类型的转换无关。

  • 我已经尝试过使用 toDF() 但它更改了不受欢迎的列名。

    import findspark
    findspark.init('/home/karan/Downloads/spark-2.4.2-bin-hadoop2.7')
    from pyspark.sql import SQLContext
    from pyspark.sql.types import StructType, StringType, IntegerType, StructField, ArrayType
    from pyspark import SparkConf, SparkContext
    import pandas as pd

    def reduceColoumns(self):
    try:
    filtered=self.rdd.map(lambda x: (x["Prescribed medicine"],list([x["Disease"],x["ID"],x["Symptoms Recorded"],x["Severeness"]])))
    except Exception as e:
    print("Error in CleanData:- ")
    print(e)
    return filtered

    def cleanData(self,s):
    try:
    self.zipWithIndex
    except Exception as e:
    print("Error in CleanData:- ")
    print(e)
    return self.filter(lambda x: x[1][0]==s)

    def relevantToSymEntered(self):
    mat = spark.createDataFrame(self,StructType([StructField("Prescribed medicine",StringType), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType)]))
    #mat = mat.rdd.map(lambda x: (x["Prescribed medicine"],list([x["ID"],x["Symptoms Recorded"],x["Severeness"]])))
    print(type(mat))


    conf = SparkConf().setMaster("local[*]").setAppName("MovieSimilarities")
    sc = SparkContext(conf = conf)
    spark=SQLContext(sc)
    rdd = spark.read.csv("/home/karan/Desktop/ExportExcel2.csv",header=True,sep=",",multiLine="True")

    print(rdd)
    newrdd=reduceColoumns(rdd)
    x=input("Enter the disease-")
    newrdd=cleanData(newrdd,x)
    relevantToSymEntered(newrdd)

    最佳答案

    StructType([StructField("Prescribed medicine",StringType), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType)])
    替换为:
    StructType([StructField("Prescribed medicine",StringType()), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType())])

    您需要实例化该类。

    关于python - Pyspark 错误 :- dataType <class 'pyspark.sql.types.StringType' > should be an instance of <class 'pyspark.sql.types.DataType' >,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56017410/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com