gpt4 book ai didi

python - 在 Python (Django) 中解析文本

转载 作者:太空宇宙 更新时间:2023-11-04 06:12:11 27 4
gpt4 key购买 nike

我的文字看起来像:

Link(base_url=u'http://www.bing.com/search?q=site%3Asomesite.com', url='http://www.somesite.com/prof.php?pID=478', text='SomeSite -  Professor Rating of Louis Scerbo', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pID=478'), ('h', 'ID=SERP,5105.1')])Link(base_url=u'http://www.bing.com/search?q=site%3Asomesite.com', url='http://www.somesite.com/prof.php?pID=527', text='SomeSite -  Professor Rating of Jahan \xe2\x80\xa6', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pID=527'), ('h', 'ID=SERP,5118.1')])Link(base_url=u'http://www.bing.com/search?q=site%3Asomesite.com', url='http://www.somesite.com/prof.php?pID=645', text='SomeSite -  Professor Rating of David Kutzik', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pID=645'), ('h', 'ID=SERP,5131.1')])

问题

  1. 有人知道这段文字的格式吗?

  2. 例如,我将如何解析元素 url 的值(来自上面的文本): http://www.somesite.com/prof.php?pID=478 http://www.somesite.com/prof.php?pID=527

  3. 你会推荐什么 python 库来解析这种类型的输出、xml、json 等?

我只是想遍历 url 并只解析 url 的值。

请记住,我使用的是 Django。

感谢您提供的任何帮助。

编辑*当前代码:*

domainLinkOutputAsString = str(domainLinkOutput) 

r = re.compile(" url='(.*?)',", ) ##ERRORENOUS, must be 're' compliant.

ProperDomains = r.findall(domainLinkOutputAsString)

return HttpResponse(ProperDomains)

最佳答案

您可以简单地使用 Python Regexp :

import re
text = "Link(base_url=u'http://www.bing.com/search?q=site%3Asomesite.com', url='http://www.somesite.com/prof.php?pID=478', text='SomeSite - Professor Rating of Louis Scerbo', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pID=478'), ('h', 'ID=SERP,5105.1')])Link(base_url=u'http://www.bing.com/search?q=site%3Asomesite.com', url='http://www.somesite.com/prof.php?pID=527', text='SomeSite - Professor Rating of Jahan \xe2\x80\xa6', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pID=527'), ('h', 'ID=SERP,5118.1')])Link(base_url=u'http://www.bing.com/search?q=site%3Asomesite.com', url='http://www.somesite.com/prof.php?pID=645', text='SomeSite - Professor Rating of David Kutzik', tag='a', attrs=[('href', 'http://www.somesite.com/prof.php?pID=645'), ('h', 'ID=SERP,5131.1')])"

# Create the regexp object to match the value of 'url'
r = re.compile(" url='(.*?)',", )

# Print all matches
print r.findall(text)

>>>['http://www.somesite.com/prof.php?pID=478', 'http://www.somesite.com/prof.php?pID=527', 'http://www.somesite.com/prof.php?pID=645']

关于python - 在 Python (Django) 中解析文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18138489/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com