gpt4 book ai didi

python - 如何在python中的一行数据中构造多行日志数据?

转载 作者:太空宇宙 更新时间:2023-11-04 05:19:05 25 4
gpt4 key购买 nike

以下是我的数据记录

30/10/2016 17:18:51 [13] 10-Full: L 1490; A 31; F 31; S 31; DL 0; SL 0; DT 5678
30/10/2016 17:18:51 [13] 00-Always: Returning 31 matches
30/10/2016 17:18:51 [13] 30-Normal: Query complete
30/10/2016 17:18:51 [13] 30-Normal: Request completed in 120 ms.
30/10/2016 17:19:12 [15] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:12 [15] 00-Always: action=Query&Text=(("XXXXXX":*/DOCUMENT/DRECONTENT/ObjectInfo/type+OR+"XXXXXX":*/DOCUMENT/.....
30/10/2016 17:19:12 [15] 10-Full: L 2; A 1; F 1; S 0; DL 0; SL 0; DT 5373
30/10/2016 17:19:12 [15] 00-Always: Returning 0 matches
30/10/2016 17:19:12 [15] 30-Normal: Query complete
30/10/2016 17:19:12 [15] 30-Normal: Request completed in 93 ms.
30/10/2016 17:19:20 [17] 00-Always: Request from 120.0.0.1
30/10/2016 17:19:20 [17] 00-Always: action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX/type+AND+XXXXXX.......
30/10/2016 17:19:51 [19] 10-Full: L 255; A 0; F 0; S 0; DL 0; SL 0; DT 5021
30/10/2016 17:19:51 [19] 00-Always: Returning 0 matches
30/10/2016 17:19:51 [19] 30-Normal: Query complete
30/10/2016 17:19:51 [19] 30-Normal: Request completed in 29 ms.
30/10/2016 17:20:44 [27] 00-Always: Request from 120.0.0.1
30/10/2016 17:20:44 [27] 00-Always: action=Query&Tex(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+(
30/10/2016 17:20:44 [27] 10-Full: L 13; A 0; F 0; S 0; DL 0; SL 0; DT 5235
30/10/2016 17:20:44 [27] 00-Always: Returning 0 matches
30/10/2016 17:20:44 [27] 30-Normal: Query complete
30/10/2016 17:20:44 [27] 30-Normal: Request completed in 27 ms.
30/10/2016 17:21:09 [25] 00-Always: Request from 120.0.0.1
30/10/2016 17:21:09 [25] 00-Always: action=Query&Text=XXXXXX:*/DOCUMENT/DRECONTENT/ObjectIn

这是我的数据集。他们有数百万。我想分析查询花费了多长时间,查询来自谁以及请求的外观。其余的我想隐藏。

我的预期输出:

30/10/2016;17:19:12;Request completed in 93 ms.;Request from 120.0.0.1;action=Query&Text=((PDF:*/DOCUMENT/DRECONTENT/XXXXX....
30/10/2016;17:18:51;Request completed in 120 ms.;Request from 120.0.0.1;action=Query&Text=(("EOM.CompoundStory":*/DOCUMENT/DRECONTE....
30/10/2016;17:19:51;Request completed in 29 ms.;Request from 120.0.0.1;action=Query&Text=(Image:*/DOCUMENT/DRECONTENT/ObjectInfo/type+AND+((.....
30/10/2016;17:20:44;Request completed in 27 ms.;Request from 120.0.0.1;action=Query&Text=XXXXX:*/DOCUMENT/DRECONT....

如果可能的话,我想用 pandas 在 python 中解决它。我已经有了一种方法:

import csv
import pandas
with open('query.csv', 'rt') as f, open('leertest.csv', 'w') as outf:
reader = csv.reader(f, delimiter=' ')
writer = csv.writer(outf, delimiter=';', quoting=csv.QUOTE_MINIMAL)
for row in reader:
for field in row:
if field == "Request":
print row

可惜没有成功。也许您有更好的方法。

我也喜欢看新技术,不需要很长时间就能学会。

最佳答案

使用 pandas,您可以执行以下操作:

column_headers = ['Date', 'Time', 'Duration', 'IP', 'Request']
df = pd.DataFrame([], columns = column_headers)
df.to_csv('out.log', index=None, sep=';')

# if you don't want to include a header line, skip the previous lines and start here
for df in pd.read_csv('data.log', sep='\s', header=None, chunksize=6):
df.reset_index(drop=True, inplace=True)
df.fillna('', inplace=True)
d = pd.DataFrame([df.loc[3,0], df.loc[3,1], ' '.join(df.loc[3,4:8]), ' '.join(df.loc[4,4:6]), ' '.join(df.loc[5,4:])])
d.T.to_csv('out.log', index=False, header=False, mode='a', sep=';')

或非 Pandas 方法:

column_headers = ['Date', 'Time', 'Duration', 'IP', 'Request']

with open('data.log') as log, open('out.log', 'w') as out:
out.write(';'.join(column_headers)+'\n') # skip this line if you don't want to include column headers
while True:
try:
lines = [next(log).strip('\n').split(' ',4) for i in range(6)][3:]
out.write(';'.join(lines[0][:2]+[l[4] for l in lines])+'\n')
except StopIteration:
break

以上两者的工作方式几乎相同。他们一次从您的文件(我将其命名为 data.log)读入六行(因为从您的示例来看,这似乎是每组的行数)。然后,它使用列表切片或 .loc pandas 函数从每一行中获取相关值。最后,它将由 ; 分隔的相关值附加到输出文件的末尾(我将其命名为 out.log)。

请注意,这两个示例都避免将整个文件一次加载到内存中,因为如果您有数百万行数据,这可能会导致问题/真正减慢速度。

编辑

我更新了上面的例子来展示如何添加列标题。如果您不想添加列标题,请跳过 pandas 示例的前三行,并跳过非 pandas 示例中 with 语句之后的第一行。

关于python - 如何在python中的一行数据中构造多行日志数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40936115/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com