gpt4 book ai didi

python - 使用 python Spark 映射另一个文件

转载 作者:行者123 更新时间:2023-12-01 02:35:03 25 4
gpt4 key购买 nike

作为 Spark 和 Python 的新手,尝试一些基本的东西来打印员工数据的计数和最大值。

from pyspark.sql import Row
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
import pyspark.sql.functions as psf

spark = SparkSession \
.builder \
.appName("Hello") \
.config("World") \
.getOrCreate()


sc = spark.sparkContext
sqlContext = SQLContext(sc)
df = spark.createDataFrame(
sc.textFile("employee.txt").map(lambda l: l.split('::')),
["employeeid","deptid","salary"]
)
df.registerTempTable("df")

mostEmpDept = sqlContext.sql("""select deptid, cntDept from (
select deptid, count(*) as cntDept, max(count(*)) over () as maxcnt
from df
group by deptid) as tmp
where tmp.cntDept = tmp.maxcnt""")

mostEmpDept.show()

上面的代码给出了员 worker 数最多的部门,如下所示

+-------+--------+                                                              
|deptid |cntDept |
+-------+--------+
| 10 | 7|
+-------+--------+

现在,我有另一个文件,其中包含所有 deptid 及其名称,如何将此结果映射到另一个文件并打印 deptid 10 名称?另一个文件如下所示

10::Marketing
20::Finance
30::HumanResource
40::HouseKeeping

最佳答案

请使用以下内容:

sc = spark.sparkContext
sqlContext = SQLContext(sc)
df = spark.createDataFrame(
sc.textFile("employee.txt").map(lambda l: l.split('::')),
["employeeid","deptid","salary"]
)
df.registerTempTable("df")

dept = spark.createDataFrame(
sc.textFile("dept.txt").map(lambda l: l.split('::')),
["deptid","deptname"]
)
dept.registerTempTable("dept")

mostEmpDept = sqlContext.sql("""select deptid, cntDept from (
select deptid, count(*) as cntDept, max(count(*)) over () as maxcnt
from df
group by deptid) as tmp
where tmp.cntDept = tmp.maxcnt""")

mostEmpDept.registerTempTable('mostEmpDept')

final_df= sqlContext.sql("select a.deptid, b.deptname from mostEmpDept a inner join dept b on a.deptid=b.deptid")

final_df.show()

如果你想保存它,请使用

final_df.saveAsTextFile('Location')

关于python - 使用 python Spark 映射另一个文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46350094/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com