gpt4 book ai didi

python - 数据框 - 找到匹配项后停止搜索和导出数据

转载 作者:行者123 更新时间:2023-12-03 19:04:14 25 4
gpt4 key购买 nike

我有一个小程序可以搜索许多大文件(每个文件 +500.000 行)并将结果导出到 csv 文件。我想知道在文件中找到特定日期后是否可以停止搜索。例如,在找到 ini_date(第 2 列)值(例如 02/12/2020)后,程序应停止搜索并导出结果,其中包括第 2 列中包含“02/12/2020”并且还匹配其他搜索条件的行.
目前,我在文件夹中有 73 个 datalog.log 文件,而且这个数字还在不断增加。 datalog0.log 是较旧的文件,datalog72.log 是最新的,过一段时间它将是 datalog73.log(我想在最新的文件中开始搜索)。这可以只用python做吗?如果没有,我将不得不为此使用 SQL。
在这里你可以看到我的代码:

import pandas as pd
from glob import glob

files = glob('C:/ProgramA/datalog*.log')
df = pd.concat([pd.read_csv(f,
low_memory=False
sep=',',
names=["0","1","2","3","4","5","6","7"]) for f in files])


#Column 0: IP
#Column 1: User
#Column 2: Date
#Column 3: Hour

ip = input('Optional - Set IP: ') #column 0
user = input('Optional - Set User: ') #column 1
ini_date = input('Mandatory - From Day (Formant MM/DD/YYYY): ')
fin_date = input('Mandatory - To Day (Formant MM/DD/YYYY): ')
ini_hour = input('Mandatory - From Hour (Formant 00:00:00): ')
fin_hour = input('Mandatory - To Hour (Formant 00:00:00): ')

if ip == '' and user == '':
df1 = df[(df["2"] >= ini_date) & (df["2"] <= fin_date) & (df["3"] >= ini_hour) & (df["3"] <= fin_hour)]
elif ip == '':
df1 = df[(df["1"] == user) & (df["2"] >= ini_date) & (df["2"] <= fin_date) & (df["3"] >= ini_hour) & (df["3"] <= fin_hour)]
elif user == '':
df1 = df[(df["0"] == ip) & (df["2"] >= ini_date) & (df["2"] <= fin_date) & (df["3"] >= ini_hour) & (df["3"] <= fin_hour)]
else:
df1 = df[(df["0"] == ip) & (df["1"] == user) & (df["2"] >= ini_date) & (df["2"] <= fin_date) & (df["3"] >= ini_hour) & (df["3"] <= fin_hour)]

df1.to_csv ('C:/ProgramA/result.csv', index = False)
谢谢。

日志类似于以下示例:
是的,日志是按顺序排列的,看起来像这样:
File0:
1.1.1.1 user 09/24/2020 09:18:00 Other data...................
1.1.1.1 user 09/24/2020 10:00:00 Other data...................
1.1.1.1 user 09/25/2020 07:30:00 Other data...................
1.1.1.1 user 09/25/2020 09:30:00 Other data...................

File1:
1.1.1.1 user 09/26/2020 04:18:00 Other data...................
1.1.1.1 user 09/26/2020 10:00:00 Other data...................
1.1.1.1 user 09/26/2020 11:18:00 Other data...................
1.1.1.1 user 09/26/2020 12:00:00 Other data...................

File2:
1.1.1.1 user 09/26/2020 14:18:00 Other data...................
1.1.1.1 user 09/27/2020 16:00:00 Other data...................
1.1.1.1 user 09/28/2020 10:18:00 Other data...................
1.1.1.1 user 09/29/2020 12:00:00 Other data...................
因此,如果我通过 ini_date >="09/27/2020"和 fin_date <="09/27/2020"进行过滤,我希望程序停止搜索并仅从 File2 导出此内容(否则,程序将不必要地检查其他 2 个文件需要更多时间):
        1.1.1.1      user       09/27/2020       16:00:00    Other data...................
1.1.1.1 user 09/28/2020 10:18:00 Other data...................

最佳答案

import glob
import os
import pandas as pd

list_of_files = glob.glob('/path/to/folder/*')

# Sorts files based on creation date
sorted_file_names = sorted(list_of_files, key=os.path.getctime, reverse = True)

rows_found = False
final_df = pd.DataFrame()
for file in sorted_file_names:
df = pd.read_csv(file)

# {Perform required operations}

# Fetches required rows
df1 = df.loc[(df['2'] <= fin_date) & (df['2'] >= ini_date)]

# If required rows don't exist in current file but existed in previous file, break
if not df1.empty:
rows_found = True
final_df = final_df.append(df1, ignore_index=False)
elif rows_found:
break

final_df.to_csv("Name.csv")

关于python - 数据框 - 找到匹配项后停止搜索和导出数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64086645/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com