gpt4 book ai didi

python - pySpark如何访问(键,元组)RDD中元组中的值(python)

转载 作者:行者123 更新时间:2023-12-01 03:07:32 25 4
gpt4 key购买 nike

我正在尝试访问 PipelineRDD 中包含的值这是我的开始:

1。 RDD = (键,代码,值)

data = [(11720, (u'I50800', 0.08229813664596274)), (11720, (u'I50801', 0.03076923076923077))]

*强调文字*2。我需要它按第一个值进行分组并将其转换为 (key,tuple ) 其中 tuple = (code,value)

testFeatures = lab_FeatureTuples = labEvents.select('ITEMID', 'SUBJECT_ID','NORM_ITEM_CNT')\ .orderBy('SUBJECT_ID','ITEMID')\ .rdd.map(lambda(ITEMID,SUBJECT_ID,NORM_ITEM_CNT):(SUBJECT_ID,(ITEMID,NORM_ITEM_CNT)))\ .groupByKey()

testFeatures =  [(11720, [(u'I50800', 0.08229813664596274)),  (u'I50801', 0.03076923076923077)])]

在元组 = (code,value) 上,我想得到以下内容:

用它创建一个稀疏向量,这样我就可以将它用于 SVM 模型

结果.take(1)

最佳答案

这是一种方法:

import pyspark
import pyspark.sql.functions as sf
import pyspark.sql.types as sparktypes
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)

data = [(11720, (u'I50800', 0.08229813664596274)),
(11720, (u'I50801', 0.03076923076923077))]
rdd = sc.parallelize(data)

df = sqlc.createDataFrame(rdd, ['idx', 'tuple'])
df.show()

给予,

+-----+--------------------+
| idx| tuple|
+-----+--------------------+
|11720|[I50800,0.0822981...|
|11720|[I50801,0.0307692...|
+-----+--------------------+

现在定义 pyspark 用户定义的函数:

extract_tuple_0 = sf.udf(lambda x: x[0], returnType=sparktypes.StringType())
extract_tuple_1 = sf.udf(lambda x: x[1], returnType=sparktypes.FloatType())
df = df.withColumn('tup0', extract_tuple_0(sf.col('tuple')))

df = df.withColumn('tup1', extract_tuple_1(sf.col('tuple')))
df.show()

给出:

+-----+--------------------+----------+------+
| idx| tuple| tup1| tup0|
+-----+--------------------+----------+------+
|11720|[I50800,0.0822981...|0.08229814|I50800|
|11720|[I50801,0.0307692...|0.03076923|I50801|
+-----+--------------------+----------+------+

关于python - pySpark如何访问(键,元组)RDD中元组中的值(python),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43209811/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com