gpt4 book ai didi

pyspark - 如何从pyspark的数据框中删除空列

转载 作者:行者123 更新时间:2023-12-04 17:00:46 25 4
gpt4 key购买 nike

name data

我们有一个数据框:

names = spark.read.csv("name.csv", header="true", inferSchema="true").rdd

我想这样做:

res=names.filter(lambda f: f['Name'] == "Diwakar").map(lambda name: (name['Name'], name['Age']))
res.toDF(['Name','Age']).write.csv("final", mode="overwrite", header="true")

但是空列造成了问题。

最佳答案

只需使用一个简单的选择,我假设空列是“”。

用于输入

df = sqlContext.createDataFrame([(1,"", "x"," "), (2,"", "b"," "), (5,"", "c"," "), (8,"", "d"," ")], ("st"," ", "ani"," "))

+---+---+---+---+
| st| |ani| |
+---+---+---+---+
| 1| | x| |
| 2| | b| |
| 5| | c| |
| 8| | d| |
+---+---+---+---+

a=list(set(df.columns))
a.remove(" ")
df=df.select(a)
df.show()

+---+---+
|ani| st|
+---+---+
| x| 1|
| b| 2|
| c| 5|
| d| 8|
+---+---+
"""
Do your Operations
"""

完成上述步骤后,继续您的任务。这将删除空白列

新编辑:

阅读时没有删除空列的方法,你必须自己做。

你可以这样做:

a = list(set(df.columns))
new_col = [x for x in a if not x.startswith("col")] #or what ever they start with

df=df.select(new_col)

关于pyspark - 如何从pyspark的数据框中删除空列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59676461/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com