gpt4 book ai didi

python - 创建一个包含每个文件的架构数据的数据框

转载 作者:行者123 更新时间:2023-12-01 09:06:52 25 4
gpt4 key购买 nike

我正在尝试创建一个数据框,然后运行一个 for 循环来查看一堆文件。遍历每一个并向文件的数据帧添加一行。包含文件名和架构详细信息?

# Schema    
schema = StructType([
StructField("filename", StringType(), True),
StructField("converteddate", StringType(), True),
StructField("eventdate", StringType(), True)
])


# Create empty dataframe
df = spark.createDataFrame(sc.emptyRDD(), schema)


for files in mvv_list:
loadName = files
videoData = spark.read\
.format('parquet')\
.options(header='true', inferSchema='true')\
.load(loadName)
dataTypeList = videoData.dtypes
two = dataTypeList[:2]
print(loadName)
print(two)

#mnt/master-video/year=2018/month=03/day=24/part-00004-tid-28948428924977-e0fc2-c85b-4296-8a05-94c5af6-2427-c000.snappy.parquet
#[('converteddate', 'timestamp'), ('eventdate', 'timestamp')]

#mnt/master-video/year=2017/month=05/day=12/part-00004-tid-2894842977-e0f21c2-c85b-4296-8a05-94c5af6-2427-c000.snappy.parquet
#[('converteddate', 'timestamp'), ('eventdate', 'date')]

#mnt/master-video/year=2016/month=03/day=24/part-00004-tid-2884924977-e0f2512-c8b-4296-8a05-945a6-2427-c000.snappy.parquet
#[('converteddate', 'timestamp'), ('eventdate', 'string')]

我正在努力创建一行并将其附加到数据框。

想要的输出

+-----------------------------+-----------------+---------------------+
|filename |converteddate |eventdate |
+-----------------------------+-----------------+---------------------+
|mnt/master-video/year=2018...|timestamp |timestamp |
|mnt/master-video/year=2017...|timestamp |date |
|mnt/master-video/year=2016...|timestamp |string |
+-----------------------------+-----------------+---------------------+

最佳答案

一种方法是将所需的数据构建为列表,然后创建 DataFrame(而不是尝试追加行)

data = []
for files in mvv_list:
loadName = files
videoData = spark.read\
.format('parquet')\
.options(header='true', inferSchema='true')\
.load(loadName)
dataTypeDict = dict(videoData.dtypes)
data.append((loadName, dataTypeDict['converteddate'], dataTypeDict['eventdate']))

schema = StructType([
StructField("filename", StringType(), True),
StructField("converteddate", StringType(), True),
StructField("eventdate", StringType(), True)
])

df = spark.createDataFrame(data, schema)

关于python - 创建一个包含每个文件的架构数据的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51970729/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com