gpt4 book ai didi

python - 文本到更清晰的文本到 Excel 电子表格项目

转载 作者:太空宇宙 更新时间:2023-11-04 05:00:27 26 4
gpt4 key购买 nike

我有一个充满 PC 数据的文本文件,组织为两种类型之一的 block 列表。要么:

*redacted*

我的目标是让 Python (3.6.2) 打开并读取文件,清理它,然后将数据编译成 excel 电子表格,如下所示:

Column 1: PC name 
Column 2: Error Type (0 if none, 1-4 for 4 error types)
Column 3: ID (if no error, no braces containing the ID)
Column 4: Password (if no error, just the password)

这是我的代码。我使用 Pycharm,并且在虚拟环境中:

import xlsxwriter

workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()

bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})

worksheet.set_column(0, 0, 14)
worksheet.set_column(1, 1, 5)
worksheet.set_column(2, 2, 38)
worksheet.set_column(3, 3, 55)

worksheet.write('A1', 'Name', bold)
worksheet.write('B1', 'Error', bold)
worksheet.write('C1', 'ID', bold)
worksheet.write('D1', 'Password', bold)

def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line.lstrip

with open("C:\\Users\\MyName\\Desktop\\BLRP.txt", "r+") as op:
gold_lst = []
nonblank = nonblank_lines(op)
for line in nonblank:
if line.startswith("Computer Name"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")
elif line.startswith("ID"):
gold_lst.append("IDG: " + str(line))
gold_lst.append('NO ERROR')
elif line.startswith("ERROR: An error occurred while"):
gold_lst.append('1')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: No key"):
gold_lst.append('2')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: An error occurred (code 0x80070057)"):
gold_lst.append('3')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("ERROR: An error occurred (code 0x8004100e)"):
gold_lst.append('4')
gold_lst.append(str('ID: {' + line + '}'))
gold_lst.append(str('Password: '))
elif line.startswith("Password"):
gold_lst.append(str('Password: ' + next(nonblank)))
print(gold_lst)
op.close()

pc_data = (gold_lst)

row = 1
col = 0

for obj in pc_data:
if obj.startswith("Computer Name"):
worksheet.write_string(row, col, obj[15:])
elif obj.startswith('NO'):
worksheet.write_number(row, col + 1, 0, left)
elif obj.startswith('1'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('2'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('3'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith('4'):
worksheet.write_number(row, col + 1, int(obj), left)
elif obj.startswith("ID: {ERROR"):
worksheet.write_string(row, col + 2, '')
elif obj.startswith("IDG: "):
worksheet.write_string(row, col + 2, obj[10:-1])
elif obj.startswith("Password"):
worksheet.write_string(row, col + 3, obj[9:])
row += 1

workbook.close()

现在,这对有问题的文件非常有效,但是,除了非常次优的代码之外,我敢肯定,还有一些我可以明确看到需要改进的地方。在这个 block 中:

if line.startswith("Computer Name"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")

如果我的行以“计算机名”开头并且下一个非空行不以“错误”开头,我只希望将“无错误”附加到我的列表中。当然,我试过这个:

if line.startswith("Computer Name"):
if next(nonblank).startswith("ERROR"):
gold_lst.append(str(line))
elif next(nonblank).startswith("VOLUME"):
gold_lst.append(str(line))
gold_lst.append("NO ERROR")

问题是,这会创建一个被顶起的 excel 电子表格,我完全不知道为什么。即使在主代码的后续步骤中我打印 gold_lst(只是为了检查是否列表是正确的),列表非常不准确。我什至无法弄清楚列表的内容。

我该如何解决这个问题?

关于第二个问题,如果我在同一个话题中问的话,我以后可能会收到的这种类型的更一般的文本文件可能包含多个ID和密码的计算机。如果我不得不猜测的话,这个 block 看起来像这样:

*redacted*

并且可能有超过 2 个这样的 ID/密码组合。 我如何修改我的代码以允许这样做?目前,我的代码不会轻易解决这个问题。我是 Python 的新手,所以也许可以,但我没有看到。

最佳答案

解决这个问题的一种方法如下:

  1. 读入整个文件,跳过任何空行。
  2. 使用 Python 的 groupby()根据 Computer Name 行将行列表拆分为 block 的函数。
  3. 对于每个 block ,尝试提取错误以及 ID 和密码列表。如果不存在,请留空。
  4. 对于每个 block ,将提取的所有数据写入电子表格的下一行。

脚本如下:

from itertools import groupby
import xlsxwriter
import re

workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()

bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
cols = [('Name', 14), ('Error', 5), ('ID1', 38), ('Password1', 55), ('ID2', 38), ('Password2', 55), ('ID3', 38), ('Password3', 55)]

for colx, (heading, width) in enumerate(cols):
worksheet.write_string(0, colx, heading, bold)
worksheet.set_column(colx, colx, width)

rowy = 1
lines = []
data = []
computer_name = None

with open('BLRP.txt') as f_input:
lines = [line.strip() for line in f_input if len(line.strip())]

for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
if k:
computer_name = re.search(r'Computer Name:\s*(.*)\s*', list(g)[0]).group(1)
elif computer_name:
block = list(g)
error = 'NO ERROR'
ids = []
passwords = []

for line_number, line in enumerate(block):
re_error = re.match('ERROR:\s+"(.*?)"', line)

if re_error:
error = re_error.group(1)

if line.startswith('Numerical Password:'):
ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
passwords.append(block[line_number+3].strip())

worksheet.write_string(rowy, 0, computer_name)
worksheet.write_string(rowy, 1, error)

for index, (id, pw) in enumerate(zip(ids, passwords)):
worksheet.write_string(rowy, index * 2 + 2, id)
worksheet.write_string(rowy, index * 2 + 3, pw)

rowy += 1 # Advance to the next output row

workbook.close()

假设你的BLRP.txt如下:

Computer Name: "Name Here1"

ERROR: "some type of error"

Blah blah
Blah blah
Blah blah

Computer Name: "Name Here2"

Volume blah blah
Blah Blah

Numerical Password:
ID: {"The ID1 is here; long string of random chars"}
Password:
"Password1 here; also a long string"

Blah Blah
Blah Blah

Numerical Password:
ID: {"The ID2 is here; long string of random chars"}
Password:
"Password2 here; also a long string"

Blah Blah
Blah Blah

Numerical Password:
ID: {"The ID3 is here; long string of random chars"}
Password:
"Password3 here; also a long string"

Blah Blah
Blah Blah

您将得到如下电子表格:

Spreadsheet screenshot

groupby() 是如何工作的?

通常,当您遍历列表时,它会一次为您提供一个条目。使用 groupby(),您可以以“组”的形式迭代此列表,其中每个组中的项目数基于条件。条件以函数的形式提供(我使用 lambda 来避免编写单独的函数)。

groupby() 将建立要返回的项目组,直到函数的结果发生变化。在本例中,该函数正在查找以单词 Computer Name 开头的行。因此,如果为真,它将返回一个项目(除非相邻的两行上有 Computer Name)。接下来它将返回所有不以 Computer Name 开头的行,依此类推。

它返回两个东西,一个key 和一个groupkey 是函数 startswith() 的结果,它将是 TrueFalsegroup 是一个包含所有匹配项的可迭代对象。 list(g) 用于将其转换为普通列表,在本例中,返回下一个 Computer Name 行之前的所有行。


将条目写入不同的行并将已知错误消息转换为数字:

from itertools import groupby
import xlsxwriter
import re

workbook = xlsxwriter.Workbook('Computer Data.xlsx')
worksheet = workbook.add_worksheet()

bold = workbook.add_format({'bold': True})
left = workbook.add_format({'align': 'justify'})
cols = [('Name', 14), ('Error', 5), ('ID', 38), ('Password', 55)]

for colx, (heading, width) in enumerate(cols):
worksheet.write_string(0, colx, heading, bold)
worksheet.set_column(colx, colx, width)

rowy = 1
lines = []
data = []
computer_name = None

error_numbers = {
'An error occurred while connecting to the BitLocker management interface.' : 1,
'No key protectors found.' : 2,
'An error occurred (code 0x80070057):' : 3,
'An error occurred (code 0x8004100e):' : 4}

with open('BLRP.txt') as f_input:
lines = [line.strip() for line in f_input if len(line.strip())]

for k, g in groupby(lines, lambda x: x.startswith("Computer Name:")):
block = list(g)

if k:
computer_name = re.search(r'Computer Name:\s*(.*)\s*', block[0]).group(1)
elif computer_name:
error_number = 0 # 0 for NO ERROR
ids = []
passwords = []

for line_number, line in enumerate(block):
re_error = re.match('ERROR:\s+?(.*)\s*?', line)

if re_error:
error = re_error.group(1)
error_number = error_numbers.get(error, -1) # Return -1 for an unknown error

if line.startswith('Numerical Password:'):
ids.append(re.search('\{(.*?)\}', block[line_number+1]).group(1))
passwords.append(block[line_number+3].strip())

worksheet.write_string(rowy, 0, computer_name)
worksheet.write_number(rowy, 1, error_number)

for id, pw in zip(ids, passwords):
worksheet.write_string(rowy, 0, computer_name)
worksheet.write_number(rowy, 1, error_number)
worksheet.write_string(rowy, 2, id)
worksheet.write_string(rowy, 3, pw)
rowy += 1 # Advance to the next output row

if len(ids) == 0:
rowy += 1 # Advance to the next output row

workbook.close()

关于python - 文本到更清晰的文本到 Excel 电子表格项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45874540/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com