0000111-6ren">
gpt4 book ai didi

python - 在 PySpark 中将十进制解码为二进制信息

转载 作者:行者123 更新时间:2023-12-01 07:37:08 27 4
gpt4 key购买 nike

我有一个关于在 PySpark 中将十进制解码为二进制值的问题。这就是我在普通 python 中的做法:

a = 28
b = format(a, "09b")
print(b)

-> 000011100

这是我想要转换的示例 DataFrame:

from pyspark import Row
from pyspark.sql import SparkSession

df = spark.createDataFrame([Row(a=1, b='28', c='11', d='foo'),
Row(a=2, b='28', c='44', d='bar'),
Row(a=3, b='28', c='22', d='foo')])

| a| b| c| d|
+---+---+---+---+
| 1| 28| 11|foo|
| 2| 28| 44|bar|
| 3| 28| 22|foo|
+---+---+---+---+

我希望将“b”列解码为:

|  a|        b|  c|  d|
+---+---------+---+---+
| 1|000011100| 11|foo|
| 2|000011100| 44|bar|
| 3|000011100| 22|foo|
+---+---------+---+---+

感谢您的帮助!

最佳答案

使用binlpad函数达到相同的输出

import pyspark.sql.functions as f
from pyspark import Row
from pyspark.shell import spark

df = spark.createDataFrame([Row(a=1, b='28', c='11', d='foo'),
Row(a=2, b='28', c='44', d='bar'),
Row(a=3, b='28', c='22', d='foo')])

df = df.withColumn('b', f.lpad(f.bin(df['b']), 9, '0'))
df.show()

使用 UDF

import pyspark.sql.functions as f
from pyspark import Row
from pyspark.shell import spark

df = spark.createDataFrame([Row(a=1, b='28', c='11', d='foo'),
Row(a=2, b='28', c='44', d='bar'),
Row(a=3, b='28', c='22', d='foo')])


@f.udf()
def to_binary(value):
return format(int(value), "09b")


df = df.withColumn('b', to_binary(df['b']))
df.show()

输出:

+---+---------+---+---+
| a| b| c| d|
+---+---------+---+---+
| 1|000011100| 11|foo|
| 2|000011100| 44|bar|
| 3|000011100| 22|foo|
+---+---------+---+---+

关于python - 在 PySpark 中将十进制解码为二进制信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56937420/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com