gpt4 book ai didi

python - 从文件中提取 URL

转载 作者:太空宇宙 更新时间:2023-11-03 21:13:23 25 4
gpt4 key购买 nike

我正在尝试从具有以下格式的文件中提取 URL。

[CertSpotter]     wwwqa.xyz.abc.com,1.1.1.1
[CertSpotter] origin.xyz.abc.com,1.1.1.1
[CertSpotter] wwwqa.xyz.abc.com,1.1.1.1
[CertSpotter] wwwmg4.xyz.abc.com,1.1.1.1

我找到了 python 脚本,但在其中,我获得了 URL 和 IP,但我需要唯一的 URL。

import re

file_path = input("Enter the File Path: ")
f = open(file_path, 'r')
raw_text= str(f.readlines())
f.close()

domain = r"\b((?:https?://)?(?:(?:www\.)?(?:[\da-z\.-]+)\.(?:[a-z]{2,6})|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:(?:[0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,7}:|(?:[0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,5}(?::[0-9a-fA-F]{1,4}){1,2}|(?:[0-9a-fA-F]{1,4}:){1,4}(?::[0-9a-fA-F]{1,4}){1,3}|(?:[0-9a-fA-F]{1,4}:){1,3}(?::[0-9a-fA-F]{1,4}){1,4}|(?:[0-9a-fA-F]{1,4}:){1,2}(?::[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:(?:(?::[0-9a-fA-F]{1,4}){1,6})|:(?:(?::[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(?::[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(?:ffff(?::0{1,4}){0,1}:){0,1}(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])|(?:[0-9a-fA-F]{1,4}:){1,4}:(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])))(?::[0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])?(?:/[\w\.-]*)*/?)\b"
foundip = re.findall( domain, raw_text )
for ip in foundip:
print(ip)

运行脚本后,我得到以下输出。

wwwqa.xyz.abc.com
1.1.1.1
origin.xyz.abc.com
1.1.1.1
wwwmg4.xyz.abc.com
1.1.1.1

所需的输出。

wwwqa.xyz.abc.com
origin.xyz.abc.com
wwwmg4.xyz.abc.com

谁能帮我解决这个问题吗?

谢谢

最佳答案

没有正则表达式。仅使用 str 方法。

例如:

with open(filename) as infile:
for line in infile:
val = line.strip().split()[-1].split(",")[0]
print(val)

输出:

wwwqa.xyz.abc.com
origin.xyz.abc.com
wwwqa.xyz.abc.com
wwwmg4.xyz.abc.com

关于python - 从文件中提取 URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54886925/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com