gpt4 book ai didi

python - 如何读取包含分组数据的 CSV,其中每个组都有不同的列?

转载 作者:太空宇宙 更新时间:2023-11-03 20:19:54 24 4
gpt4 key购买 nike

我刚刚开始学习Python。在相同的上下文中,我有一个任务来解析 CSV 并与另一个相同格式的文件进行比较。

CSV 可以读作:

"first-report","10/01/2019 at  18:54:55"
"Tags Company","B2 603, Belcastel","MV Street, (near Orbis School - 2)","Pune","Maharashtra","India","1"
"James Kooney","sants_rn","Manager"
"Groups","IPs","Hosts","Hosts Matching Filters","Analysis","Date Range","Network","Tags"
"null","NONE","0","0","scans","N/A","ALL","NONE"

"Total Vulnerabilities","Avg Risk","Business Risk"
"17","2.8","14/100"

"IP","Network","Total Vulnerabilities","Security Risk"
"10.10.10.10","Global Default Network","17","2.8"

by Status
"Status","Confirmed","Potential","Total"
"New","1","3","4"
"Active","0","0","0"
"Re-Opened","0","0","0"
"Total","1","3","4"
"Fixed","0","0","0"
"Changed","1","3","4"

正如示例数据中所示,CSV 没有固定列。数据被分为不同的组。我想比较上述 CSV 中的组中的以下键,并在键值不匹配的地方打印出摘要文件中的差异。例如。在第 14 行发现差异,预期"new"发现“事件”

"Groups","IPs","Hosts","Hosts Matching Filters","Analysis","Date Range","Network","Tags"
"Total Vulnerabilities","Avg Risk","Business Risk"
"IP","Network","Total Vulnerabilities","Security Risk"
"Status","Confirmed","Potential","Total"

有人可以指导我找到最佳解决方案吗?

我一直在努力寻找不同的选择,但到目前为止还没有运气。我的方法是使用 CSV.DictReader 来比较每个键,但是,由于列数可变,我面临一些索引问题。

这是我编写的示例代码。

    summary = open(summary, 'w')
actualcsvdict = csv.DictReader(open(actualoutput), fieldnames=fieldnames)
exxpectedcsvdict = csv.DictReader(open(expectedoutput), fieldnames=fieldnames)

actualcsvrows = list(actualcsvdict)
expectedcsvrows = list(exxpectedcsvdict)
print(len(actualcsvrows))
for line in range(len(actualcsvrows)):
if actualcsvrows[line] != expectedcsvrows[line]:
summary.write(f"\nMismatch found at line number {line + 2}\n")
for key1 in actualcsvrows[line]:
if actualcsvrows[line][key1] != expectedcsvrows[line][key1]:
summary.write(
f"For {key1} column, Expected value was[ {actualcsvrows[line][key1]} ] Found [ {expectedcsvrows[line][key1]} ]\n")

附注本例中的字段名称是

"Status","Confirmed","Potential","Total"

最佳答案

对于您的具体情况,您不需要使用 DictReader 类,普通的 reader 类就足够了。

summary = open(summary, 'w')
actualcsv = csv.reader(open(actualoutput))
exxpectedcsv = csv.reader(open(expectedoutput))

actualrows = list(actualcsv)
expectedrows = list(exxpectedcsv)
for line in range(len(actualrows)):
if actualrows[line] != expectedrows[line]:
summary.write(f"\nMismatch found at line number {line + 2}\n")
for act,exp in zip(actualrows[line], expectedrows[line]):
if act != exp:
summary.write(f"Expected {exp}, got {act}\n")

但说实话,我认为你的问题可以通过 difflib 来解决库,具体取决于您的具体需求。

关于python - 如何读取包含分组数据的 CSV,其中每个组都有不同的列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58229847/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com