gpt4 book ai didi

Python:追加()和扩展()

转载 作者:行者123 更新时间:2023-11-28 21:32:09 30 4
gpt4 key购买 nike

我有一个 300 万行的 .txt 文件。该文件包含如下所示的数据:

# RSYNC: 0 1 1 0 512 0
#$SOA 5m localhost. hostmaster.localhost. 1906022338 1h 10m 5d 1s
# random_number_ofspaces_before_this text $TTL 60s
#more random information
:127.0.1.2:https://www.spamhaus.org/query/domain/$
test
:127.0.1.2:https://www.spamhaus.org/query/domain/$
.0-0m5tk.com
.0-1-hub.com
.zzzy1129.cn
:127.0.1.4:https://www.spamhaus.org/query/domain/$
.0-il.ml
.005verf-desj.com
.01accesfunds.com

我正在尝试解析它,使其看起来像:

+--------------------+--------------+-------------+-----------------------------------------------------+
| domain_name | period_count | parsed_code | raw_code |
+--------------------+--------------+-------------+-----------------------------------------------------+
| test | 0 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .0-0m5tk.com | 2 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .0-1-hub.com | 2 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .zzzy1129.cn | 2 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .0-il.ml | 2 | 127.0.1.4 | :127.0.1.4:https://www.spamhaus.org/query/domain/$ |
| .005verf-desj.com | 2 | 127.0.1.4 | :127.0.1.4:https://www.spamhaus.org/query/domain/$ |
| .01accesfunds.com | 2 | 127.0.1.4 | :127.0.1.4:https://www.spamhaus.org/query/domain/$ |
+--------------------+--------------+-------------+-----------------------------------------------------+

为此,我提出了以下建议:

rows = []
raw_code = None
parsed_code = None
with open('dbl-sr-2019-06-02T23_38_27Z.txt', 'r') as f: # assumes the file name is input.txt
for line in f:
line = line.rstrip('\n')
if line.startswith(':127'):
raw_code = line
parsed_code = re.split(":", line)[1]
continue
if line.startswith('#'):
continue
rows.append((line, parsed_code))
# rows.append((raw_code))
# rows.extend((line, parsed_code, raw_code))
# rows.extend((raw_code))

import pandas as pd
df = pd.DataFrame(rows, columns=['domain_name', "parsed_code" 'raw_spamhaus_return_code'])
print(df)

上面代码块中注释掉的行没有产生我想要的输出,或者给出了错误。我正在努力构建一个包含 2 列以上的 Pandas 数据框。我可以获得 domain_name 和另一列。看来我无法获取代码以正确使用 .append.extend 函数。有人可以指导一下吗?

最佳答案

问题的可能根源是缺少逗号。

这个:

df = pd.DataFrame(rows, columns=[
'domain_name', 'parsed_code', 'raw_spamhaus_return_code'])

不等于:

df = pd.DataFrame(rows, columns=[
'domain_name', "parsed_code" 'raw_spamhaus_return_code'])

因为(注意缺少逗号):

"parsed_code" 'raw_spamhaus_return_code'

变成一个字符串。

测试代码:

import re

data = [x.strip() for x in """
# RSYNC: 0 1 1 0 512 0
#$SOA 5m localhost. hostmaster.localhost. 1906022338 1h 10m 5d 1s
# random_number_ofspaces_before_this text $TTL 60s
#more random information
:127.0.1.2:https://www.spamhaus.org/query/domain/$
test
:127.0.1.2:https://www.spamhaus.org/query/domain/$
.0-0m5tk.com
.0-1-hub.com
.zzzy1129.cn
:127.0.1.4:https://www.spamhaus.org/query/domain/$
.0-il.ml
.005verf-desj.com
.01accesfunds.com
""".split('\n')[1:-1]]

rows = []
raw_code = None
parsed_code = None
for line in data:
line = line.rstrip('\n')
if line.startswith(':127'):
raw_code = line
parsed_code = re.split(":", line)[1]
continue
if line.startswith('#'):
continue
rows.append((line, line.count('.'), parsed_code, raw_code))

import pandas as pd

df = pd.DataFrame(rows, columns=[
'domain_name', 'period_count ', 'parsed_code',
'raw_spamhaus_return_code'])
print(df)

结果:

         domain_name  period_count  parsed_code  \
0 test 0 127.0.1.2
1 .0-0m5tk.com 2 127.0.1.2
2 .0-1-hub.com 2 127.0.1.2
3 .zzzy1129.cn 2 127.0.1.2
4 .0-il.ml 2 127.0.1.4
5 .005verf-desj.com 2 127.0.1.4
6 .01accesfunds.com 2 127.0.1.4

raw_spamhaus_return_code
0 :127.0.1.2:https://www.spamhaus.org/query/doma...
1 :127.0.1.2:https://www.spamhaus.org/query/doma...
2 :127.0.1.2:https://www.spamhaus.org/query/doma...
3 :127.0.1.2:https://www.spamhaus.org/query/doma...
4 :127.0.1.4:https://www.spamhaus.org/query/doma...
5 :127.0.1.4:https://www.spamhaus.org/query/doma...
6 :127.0.1.4:https://www.spamhaus.org/query/doma...

关于Python:追加()和扩展(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56981256/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com