gpt4 book ai didi

python - 从 Python 字符串中删除 Wiki 标记

转载 作者:行者123 更新时间:2023-12-01 05:59:48 24 4
gpt4 key购买 nike

我有一个字符串,其中包含从 Wikia 页面下载的信息。

为了解析其内容,我如何从页面中删除所有 Wiki 格式,只留下原始文本?

以下是可能出现的情况的示例:

#REDIRECT[[Blah]]

{{
I have some stuff in here
}}
[[I also have some stuff in here|and here]]
[[http://blehthisisfake.com Link to a fake website]]

<span class="plainlinks">This is quite useless. Why was [[this page]] even created?</span>

<nowiki>There are more HTML tags, they should probably all be stripped...</nowiki>

There is random text in here. bleh bleh bleh

I'm not sure what single [brackets] do, but they should be stripped too...

预期输出:

There is random text in here. bleh bleh blehI'm not sure what single do, but they should be stripped too...

是否有一个模块可以做到这一点?

最佳答案

Google 搜索“python wiki 解析器”出现 this code ,它会删除并替换标签(有关详细信息,请参阅链接中的源代码)。

关于python - 从 Python 字符串中删除 Wiki 标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11060877/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com