gpt4 book ai didi

Python difflib 比较两个 csv 文件并突出显示 HTML 输出中的世界水平差异

转载 作者:太空宇宙 更新时间:2023-11-03 15:09:46 25 4
gpt4 key购买 nike

我不是Python方面的专家,我尝试尽最大努力寻找答案,但找不到。请原谅,如果这是一个重复的问题,请为我指出正确的方向,如果可以的话。

我正在尝试使用 Python Difflib 比较两个 CSV 文件并将 Diff 输出生成为 HTML 页面。当前的 difflib 模块具有内置选项 -m,通过突出显示差异来并排生成两个 csv 文件的 HTML 输出。

但是,difflib 使用 difflib.SequenceMatcher 查找差异并使用 difflib.HtmlDiff.make_file 创建 HTML 文件。但是,它产生的输出不是我想要的。

The output I am getting currently from the difflib is :The Default Python DIFFLIB HTML output is Here.

但是,我想要的输出是:我正在寻找单词级别突出显示,而不是在字符级别突出显示或序列突出显示的更改。如果旧文件和新文件之间发生任何更改,我希望突出显示整个单词

The changes that I want to highlight is: A word Level highlight of the text.

请在这方面帮助我,这对于 difflib 是否真的可行,还是我必须使用任何其他工具/模块。我尝试使用 vimdiff 和其他插件,但我一无所获。我对这里的任何事情都持开放态度。

我使用的代码来自 PythonDiffLib 文档页面。

import sys, os, time, difflib, optparse
def main():
..
..
..
n = options.lines //I used this n = ZERO.
fromfile, tofile = args # as specified in the usage string

# we're passing these as arguments to the diff function
fromdate = time.ctime(os.stat(fromfile).st_mtime)
todate = time.ctime(os.stat(tofile).st_mtime)
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()

diff = difflib.HtmlDiff().make_file(fromlines, tolines, fromfile,
tofile, context=TRUE,
numlines=0)

# we're using writelines because diff is a generator
sys.stdout.writelines(diff)

`旧.csv

refno,title,author,year,price
1001,CPP,MILTON,2008,456
1002,JAVA,Gilson,2002,456
1003,Adobe Flex,2010,566
1004,General Knowledge,Sinson,2007,465
1005,Actionscript,Gilto,2008,480

new.csv

refno,title,author,year,price
1001,CPP,MILTON,2010,456,2008
1002,JAVA,Gilson,2002
1003,Adobe Flexi,Johnson,2010,566
1004,General Knowledge,Simpson,2007,465
105,Action script,Gilto,2008,480
2000,Drama,DayoNe,,2020,560

我还在下面添加了默认 HTML DIFF 输出和预期 HTML DIFF 输出。

Default HTML DIFF Output from DIFFLIB:

<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>

<body>

<table class="diff" id="difflib_chg_to0__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<thead><tr><th class="diff_next"><br /></th><th colspan="2" class="diff_header">old.csv</th><th class="diff_next"><br /></th><th colspan="2" class="diff_header">new.csv</th></tr></thead>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to0__0"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,200<span class="diff_sub">8</span>,456</td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,20<span class="diff_add">1</span>0,456<span class="diff_add">,2008</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002<span class="diff_sub">,456</span></td><td class="diff_next"></td><td class="diff_header" id="to0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_4">4</td><td nowrap="nowrap">1003,Adobe&nbsp;Flex,2010,566</td><td class="diff_next"></td><td class="diff_header" id="to0_4">4</td><td nowrap="nowrap">1003,Adobe&nbsp;Flex<span class="diff_add">i,Johnson</span>,2010,566</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_5">5</td><td nowrap="nowrap">1004,General&nbsp;Knowledge,Si<span class="diff_chg">n</span>son,2007,465</td><td class="diff_next"></td><td class="diff_header" id="to0_5">5</td><td nowrap="nowrap">1004,General&nbsp;Knowledge,Si<span class="diff_chg">mp</span>son,2007,465</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_6">6</td><td nowrap="nowrap">1<span class="diff_sub">0</span>05,Actionscript,Gilto,2008,480</td><td class="diff_next"></td><td class="diff_header" id="to0_6">6</td><td nowrap="nowrap">105,Action<span class="diff_add">&nbsp;</span>script,Gilto,2008,480</td></tr>
<tr><td class="diff_next"></td><td class="diff_header"></td><td nowrap="nowrap"></td><td class="diff_next"></td><td class="diff_header" id="to0_7">7</td><td nowrap="nowrap"><span class="diff_add">2000,Drama,DayoNe,,2020,560</span></td></tr>
</tbody>
</table>

</body>

</html>

Expected HTML DIFF Output from DIFFLIB:

<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1" />
<title></title>
<style type="text/css">
table.diff {font-family:Courier; border:medium;}
.diff_header {background-color:#e0e0e0}
td.diff_header {text-align:right}
.diff_next {background-color:#c0c0c0}
.diff_add {background-color:#aaffaa}
.diff_chg {background-color:#ffff77}
.diff_sub {background-color:#ffaaaa}
</style>
</head>

<body>

<table class="diff" id="difflib_chg_to0__top"
cellspacing="0" cellpadding="0" rules="groups" >
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<colgroup></colgroup> <colgroup></colgroup> <colgroup></colgroup>
<thead><tr><th class="diff_next"><br /></th><th colspan="2" class="diff_header">old.csv</th><th class="diff_next"><br /></th><th colspan="2" class="diff_header">new.csv</th></tr></thead>
<tbody>
<tr><td class="diff_next" id="difflib_chg_to0__0"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="from0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,<span class="diff_sub">2008</span>,456</td><td class="diff_next"><a href="#difflib_chg_to0__top">t</a></td><td class="diff_header" id="to0_2">2</td><td nowrap="nowrap">1001,CPP,MILTON,<span class="diff_add">2010</span>,456<span class="diff_add">,2008</span></td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002<span class="diff_sub">,456</span></td><td class="diff_next"></td><td class="diff_header" id="to0_3">3</td><td nowrap="nowrap">1002,JAVA,Gilson,2002</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_4">4</td><td nowrap="nowrap">1003,<span class="diff_sub">Adobe&nbsp;Flex</span>,2010,566</td><td class="diff_next"></td><td class="diff_header" id="to0_4">4</td><td nowrap="nowrap">1003,<span class="diff_add">Adobe&nbsp;Flexi</span>,<span class="diff_add">Johnson</span>,2010,566</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_5">5</td><td nowrap="nowrap">1004,General&nbsp;Knowledge,<span class="diff_sub">Sinson</span>,2007,465</td><td class="diff_next"></td><td class="diff_header" id="to0_5">5</td><td nowrap="nowrap">1004,General&nbsp;Knowledge,<span class="diff_add">Simpson</span>,2007,465</td></tr>
<tr><td class="diff_next"></td><td class="diff_header" id="from0_6">6</td><td nowrap="nowrap"><span class="diff_sub">1005</span>,<span class="diff_sub">Actionscript</span>,Gilto,2008,480</td><td class="diff_next"></td><td class="diff_header" id="to0_6">6</td><td nowrap="nowrap"><span class="diff_add">105</span>,<span class="diff_add">Action&nbsp;script</span>,Gilto,2008,480</td></tr>
<tr><td class="diff_next"></td><td class="diff_header"></td><td nowrap="nowrap"></td><td class="diff_next"></td><td class="diff_header" id="to0_7">7</td><td nowrap="nowrap"><span class="diff_add">2000,Drama,DayoNe,,2020,560</span></td></tr>
</tbody>
</table>

</body>

</html>

最佳答案

Question: I am looking for a word level highlight

实现类 Comma_HtmlDiff,将突出显示扩展到逗号边界:
您必须重载 difflib.ndiff

Note: Only expand the first highlighted Part is implemented.
If difflib.ndiff highlights across Comma, this is not corrected.

class Comma_HtmlDiff(difflib.HtmlDiff):
def __init__(self, tabsize=8, wrapcolumn=None, linejunk=None,
charjunk=difflib.IS_CHARACTER_JUNK):
setattr(difflib, '_ndiff', difflib.ndiff)
setattr(difflib, 'ndiff', self.ndiff)
super().__init__(tabsize, wrapcolumn, linejunk, charjunk)

def ndiff(self, a, b, linejunk=None, charjunk=difflib.IS_CHARACTER_JUNK):
_line = ''
for line in difflib._ndiff(a, b, linejunk, charjunk):
if line.startswith('-'):
_d = '-'
_line = line
elif line.startswith('+'):
_d = '+'
_line = line

if line.startswith('?'):
dp = line.find(_d)
if dp == -1:
_d = '+'
dp = line.find('^')
dpl = _line.rfind(',', 0, dp)
if dpl == -1:
dpl = 2
else:
dpl += 1
dpr = _line.find(',', dp)
if dpr == dp:
_d = ' '
dpl = dp
dpr = dp+1

dpw = dpr - dpl
line = line[:dpl] + _d*dpw + line[dpr:]

yield line

# Usage
diff = Comma_HtmlDiff().make_file(fromlines, tolines, fromfile,
tofile, context=True,
numlines=0)

Output:
enter image description here

使用 Python 测试:3.4.2

关于Python difflib 比较两个 csv 文件并突出显示 HTML 输出中的世界水平差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44317465/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com