gpt4 book ai didi

python - 如何在pyspark中使用正则表达式替换特殊字符

转载 作者:行者123 更新时间:2023-12-01 02:23:49 29 4
gpt4 key购买 nike

我尝试将文本文件中的 }{ 替换为 },{,但收到错误消息

return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

我正在使用 python (pyspark) 编写 Spark 作业。

代码:

from pyspark.sql import SparkSession
import re

if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: PythonBLEDataParser.py <file>", file=sys.stderr)
exit(-1)

spark = SparkSession\
.builder\
.appName("PythonBLEDataParser")\
.getOrCreate()

toJson = spark.sparkContext.textFile("/root/vasi/spark-2.2.0-bin-hadoop2.7/vas_files/BLE_data_Sample.txt")
toJson1 = re.sub("}{","},{",toJson) #i want to replace }{ with },{
print(toJson1)

示例数据:

{"EdgeMac":"E4956E4E4015","BeaconMac":"247189F24DDB","RSSI":-59,"MPow":-76,"Timestamp":"1486889542495633","AdData":"0201060303AAFE1716AAFE00DD61687109E602F514C96D00000001F05C0000"}
{"EdgeMac":"E4956E4E4016","BeaconMac":"247189F24DDC","RSSI":-59,"MPow":-76,"Timestamp":"1486889542495633","AdData":"0201060303AAFE1716AAFE00DD61687109E602F514C96D00000001F05C0000"}
{"EdgeMac":"E4956E4E4017","BeaconMac":"247189F24DDD,"RSSI":-59,"MPow":-76,"Timestamp":"1486889542495633","AdData":"0201060303AAFE1716AAFE00DD61687109E602F514C96D00000001F05C0000"}

最佳答案

尝试使用 dataframe 而不是 rdd 及其工作原理。只是在大括号之前放置转义字符

df_sample = spark.read.text('path/to/sample.txt')
df_sample.withColumn('value',regexp_replace(df_sample['value'],'\\}\\{','},{')).collect()[0]

关于python - 如何在pyspark中使用正则表达式替换特殊字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47648038/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com