gpt4 book ai didi

regex - 将字符串格式化为XML文件

转载 作者:行者123 更新时间:2023-12-02 20:29:51 24 4
gpt4 key购买 nike

我想将字符串重新格式化为XML结构,但是我的字符串不是XML格式(使用Python 2.7)。
我相信正确的方法是先在一行中创建输入的XML格式,然后使用XML Pretty Print将其制成具有多行和缩进的XML文件(
Pretty printing XML in Python)。
下面是历史服务器REST API调用Hadoop服务器1之后的输入示例。
输入:

'{"jobAttempts":{"jobAttempt":[{"nodeHttpAddress":"slave2:8042","nodeId":"slave2:39637","id":1,"startTime":1544691730439,"containerId":"container_1544631848492_0013_01_000001","logsLink":"http://23.22.43.90:19888/jobhistory/logs/slave2:39637/container_1544631848492_0013_01_000001/job_1544631848492_0013/hadoop2"}]}}' 
输出:
'<jobAttempts><jobAttempt><nodeHttpAddress>slave2:8042</nodeHttpAddress><nodeId>slave2:39637</nodeId><id>1</id><startTime>1544691730439</startTime><containerId>container_1544631848492_0013_01_000001</containerId><logsLink>http://23.22.43.90:19888/jobhistory/logs/slave2:39637/container_1544631848492_0013_01_000001/job_1544631848492_0013/hadoop2</logsLink></jobAttempt></jobAttempts>' 
最终输出
<jobAttempts>
<jobAttempt>
<nodeHttpAddress>slave2:8042</nodeHttpAddress>
<nodeId>slave2:39637</nodeId>
<id>1</id>
<startTime>1544691730439</startTime>
<containerId>container_1544631848492_0013_01_000001</containerId>
<logsLink>http://23.22.43.90:19888/jobhistory/logs/slave2:39637/container_1544631848492_0013_01_000001/job_1544631848492_0013/hadoop2</logsLink>
</jobAttempts>
</jobAttempt>
*此字符串实际上是一个XML文件,似乎没有与之关联的任何样式信息。

最佳答案

我发现,History Server REST API的源 View 确实是一行中的XML文件。因此,我不得不阅读源代码 View ,而不是使用python的老问题 View 。
在我使用之前

import urllib2
contents = urllib2.urlopen("http://http://23.22.43.90:19888/ws/v1/history/mapreduce/jobs/job_1544631848492_0013//jobattempts").read()

现在,我正在下载selenium和BeautifulSoup的html页面的源 View ,并将其保存在本地。
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import xml.dom.minidom
driver = webdriver.Firefox()
driver.get("http://23.22.43.90:19888/ws/v1/history/mapreduce/jobs/job_1544631848492_0013/jobattempts")
page_source = driver.page_source
driver.close()
soup = BeautifulSoup(page_source, "html.parser")
print(soup)
xml = xml.dom.minidom.parseString(str(soup))
pretty_xml_as_string = xml.toprettyxml()
file = open("./content_new_2.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

关于regex - 将字符串格式化为XML文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53796175/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com