- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
除了几年前在 VB 中完成的简短小事外,我还没有真正用 VBA 或类似的东西编写过代码。这是我尝试编写一些代码来搜索客户帐户的 excel 表数据库并搜索可能的重复帐户。可悲的是,在我需要运行它的机器上,它只能处理大约 3,500 个条目而不会导致 Excel 崩溃。我将这归因于我的代码非常未优化以及机器运行缓慢。
可以做些什么来优化以下代码,以及将来我应该使用哪些 VBA 最佳实践?
'Essentially, this loops through each row in the sheet
'For each row, it loops through every row after it, searching for duplicates of itself (skipping over a rows that have previously been marked as duplicates)
'Duplicates are defined by entries that meet a 'threshhold' of similarity
'The threshhold is defined as the number '5', first and last names are each two points, address and email address are one point
'That means that in order for an entry to meet the thresshold, the first and last name must be the same, and it must also have either the same address or email
'When duplicates are found, the duplicate column is marked as 'Yes' for that row, and the first occurence column is marked with a number defining the row number where the account first appeared
Sub Main():
Dim lNameCol, fNameCol, addressCol, emailCol, duplicateCol, fOccurenceCol As String
'Defines the column letters for the various data fields
lNameCol = "A"
fNameCol = "B"
addressCol = "C"
emailCol = "D"
duplicateCol = "E" 'The column where a entry/row will be marked as being a duplicate
fOccurenceCol = "F" 'The column that contains the row number where a duplicate accounts first occurence was found
Call Duplicates(lNameCol, fNameCol, addressCol, emailCol, duplicateCol, fOccurenceCol)
End Sub
'Gets number of rows in currently active sheet
Function RowCount():
Application.ActiveSheet.UsedRange
RowCount = Worksheets("Sheet1").UsedRange.Rows.Count
End Function
'Finds and labels duplicates
Sub Duplicates(ByVal lNameCol As String, ByVal fNameCol As String, ByVal addressCol As String, ByVal emailCol As String, ByVal duplicateCol As String, ByVal fOccurenceCol As String)
Dim lRowCount As Integer
lRowCount = RowCount()
'Loops through each row in the sheet
For i = 1 To lRowCount
Dim duplicate, lastName, firstName, email, address As String
'Sets these variables' values corresponding cell value in row 'i'
'UCase capitilizes things to make entries case-insensitive
duplicate = UCase(Range(duplicateCol & i).Value)
lastName = UCase(Range(lNameCol & i).Value)
firstName = UCase(Range(fNameCol & i).Value)
email = UCase(Range(emailCol & i).Value)
address = UCase(Range(addressCol & i).Value)
'Checks to make sure row has not already been marked a duplicate, if it hasn't it continues
If (StrComp(duplicate = "YES", vbTextCompare) = 1) Then
'Loops through every row after the current row (row 'i')
For n = (i + 1) To lRowCount
'duplicateThreshold is an integer that defines the threshhold of similarity that rows need to have in order to be labeled a duplicate
Dim duplicateThreshhold As Integer
Dim lastName2, firstName2, email2, address2 As String
duplicateThreshhold = 0
'These are the entry variables for account entry at row 'n' being compared to the account entry at row 'i'
lastName2 = UCase(Range(lNameCol & n).Value)
firstName2 = UCase(Range(fNameCol & n).Value)
email2 = UCase(Range(emailCol & n).Value)
address2 = UCase(Range(addressCol & n).Value)
'Adds 2 points to threshhold if first name is the same
If lastName = lastName2 Then
duplicateThreshhold = duplicateThreshhold + 2
End If
'Adds 2 points to threshold if last name is the same
If firstName = firstName2 Then
duplicateThreshhold = duplicateThreshhold + 2
End If
'The remaining two fields give 1 point each to the thresshold
'As long as the sum of the points given by first and last name is always greater than half of the threshhold, first and last name will always be required
If email = email2 Or address = address2 Then
duplicateThreshhold = duplicateThreshhold + 1
End If
If duplicateThreshhold > 4 Then
'Labels duplicate entries as duplicates
Range(duplicateCol & i).Value = "Yes"
Range(duplicateCol & n).Value = "Yes"
'Labels duplicate entries with the first occurence of that entry
Range(fOccurenceCol & i).Value = i 'Labels first occurence account's row number
Range(fOccurenceCol & n).Value = i
End If
Next
End If
Next
End Sub
最佳答案
好的,这是困扰我的问题之一,所以我必须解决它(非常感谢@RJGordon!)。我最终以两种不同的方式解决了它——第一种使用嵌套循环,第二种使用散列字典。第二个是一种更简洁、更快的算法,但为了彻底起见,我将同时介绍这两种算法。
嵌套循环
正如@JohnColeman 指出的那样,这种方法在逻辑上是有意义的,但扩展性非常大。为每条记录提供所有重复行的列表很容易,并且具有标记数据集中第一行的优点。 (下面的第二个解决方案不会用下面的重复项标记初始记录,但如果需要,您也可以解决这个问题。)
Option Explicit
Sub test()
MarkDuplicates ActiveSheet, 1, 2, 3, 4, 5, 6
End Sub
Sub MarkDuplicates(sh As Worksheet, lNameCol As Long, _
fNameCol As Long, addressCol As Long, _
emailCol As Long, duplicateCol As Long, _
fOccuranceCol As Long)
Dim lastRow As Long
Dim lastCol As Long
Dim acctRange As Range
Dim acctData As Variant
Dim checkRow As Long
Dim otherRow As Long
Dim dupScore As Integer
Dim dupList As String
'--- determine the range of data and copy to a memory-based array
lastRow = sh.Cells(sh.Rows.Count, "A").End(xlUp).Row
lastCol = sh.Cells(1, sh.Columns.Count).End(xlToLeft).Column
Set acctRange = sh.Range("A1").Resize(lastRow, lastCol)
acctData = acctRange
'--- nested loop to check each row against every other row
For checkRow = 2 To lastRow
dupList = ""
For otherRow = 2 To lastRow
dupScore = 0
If otherRow <> checkRow Then
If acctData(checkRow, lNameCol) = acctData(otherRow, lNameCol) Then
dupScore = dupScore + 2
End If
If acctData(checkRow, fNameCol) = acctData(otherRow, fNameCol) Then
dupScore = dupScore + 2
End If
If acctData(checkRow, addressCol) = acctData(otherRow, addressCol) Then
dupScore = dupScore + 1
End If
If acctData(checkRow, emailCol) = acctData(otherRow, emailCol) Then
dupScore = dupScore + 1
End If
If dupScore > 4 Then
dupList = dupList & otherRow & ","
End If
End If
Next otherRow
If Len(dupList) > 0 Then
dupList = Left(dupList, Len(dupList) - 1)
acctData(checkRow, duplicateCol) = "Yes"
acctData(checkRow, fOccuranceCol) = dupList
Else
acctData(checkRow, duplicateCol) = ""
acctData(checkRow, fOccuranceCol) = ""
End If
Next checkRow
'--- copy the array back to the worksheet
acctRange = acctData
Set sh = Nothing
End Sub
Option Explicit
Sub test()
MarkDuplicates ActiveSheet, 1, 2, 3, 4, 5, 6
End Sub
Sub MarkDuplicates(sh As Worksheet, lNameCol As Long, _
fNameCol As Long, addressCol As Long, _
emailCol As Long, duplicateCol As Long, _
fOccuranceCol As Long)
Dim lastRow As Long
Dim lastCol As Long
Dim acctRange As Range
Dim acctData As Variant
Dim acctDict1 As Dictionary
Dim acctDict2 As Dictionary
Dim acctDict3 As Dictionary
Dim acctKey As String
Dim checkRow As Long
Dim otherRow As Long
Dim dupScore As Integer
Dim dupList As String
'--- determine the range of data and copy to a memory-based array
lastRow = sh.Cells(sh.Rows.Count, "A").End(xlUp).Row
lastCol = sh.Cells(1, sh.Columns.Count).End(xlToLeft).Column
Set acctRange = sh.Range("A1").Resize(lastRow, lastCol)
acctData = acctRange
Set acctDict1 = New Dictionary
Set acctDict2 = New Dictionary
Set acctDict3 = New Dictionary
'--- build the initial dictionary
' for the key to trip as duplicate, there are three possible
' combinations to check, so we make three dictionaries and
' create keys as combinations of the fields
For checkRow = 2 To lastRow
'--- clear previous flags
acctData(checkRow, duplicateCol) = ""
acctData(checkRow, fOccuranceCol) = ""
'--- dupe is lastname + firstname
acctKey = acctData(checkRow, lNameCol) & acctData(checkRow, fNameCol)
If Not acctDict1.Exists(acctKey) Then
acctDict1.Add acctKey, checkRow
ElseIf acctData(checkRow, duplicateCol) <> "Yes" Then
acctData(checkRow, duplicateCol) = "Yes1"
acctData(checkRow, fOccuranceCol) = acctDict1.Item(acctKey)
End If
'--- dupe is lastname + address + email
acctKey = acctData(checkRow, lNameCol) & acctData(checkRow, addressCol) & _
acctData(checkRow, emailCol)
If Not acctDict2.Exists(acctKey) Then
acctDict2.Add acctKey, checkRow
ElseIf acctData(checkRow, duplicateCol) <> "Yes" Then
acctData(checkRow, duplicateCol) = "Yes2"
acctData(checkRow, fOccuranceCol) = acctDict2.Item(acctKey)
End If
'--- dupe is firstname + address + email
acctKey = acctData(checkRow, fNameCol) & acctData(checkRow, addressCol) & _
acctData(checkRow, emailCol)
If Not acctDict3.Exists(acctKey) Then
acctDict3.Add acctKey, checkRow
ElseIf acctData(checkRow, duplicateCol) <> "Yes" Then
acctData(checkRow, duplicateCol) = "Yes3"
acctData(checkRow, fOccuranceCol) = acctDict3.Item(acctKey)
End If
Next checkRow
'--- copy the array back to the worksheet
acctRange = acctData
Set sh = Nothing
End Sub
关于vba - 优化 VBA/Excel 宏代码(在大工作表中查找重复项),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35893062/
比较代码: const char x = 'a'; std::cout > (0C310B0h) 00C3100B add esp,4 和 const i
您好,我正在使用 Matlab 优化求解器,但程序有问题。我收到此消息 fmincon 已停止,因为目标函数值小于目标函数限制的默认值,并且约束满足在约束容差的默认值范围内。我也收到以下消息。警告:矩
处理Visual Studio optimizations的问题为我节省了大量启动和使用它的时间 当我必须进行 J2EE 开发时,我很难回到 Eclipse。因此,我还想知道人们是否有任何提示或技巧可
情况如下:在我的 Excel 工作表中,有一列包含 1-name 形式的条目。考虑到数字也可以是两位数,我想删除这些数字。这本身不是问题,我让它工作了,只是性能太糟糕了。现在我的程序每个单元格输入大约
这样做有什么区别吗: $(".topHorzNavLink").click(function() { var theHoverContainer = $("#hoverContainer");
这个问题已经有答案了: 已关闭11 年前。 Possible Duplicate: What is the cost of '$(this)'? 我经常在一些开发人员代码中看到$(this)引用同一个
我刚刚结束了一个大型开发项目。我们的时间紧迫,因此很多优化被“推迟”。既然我们已经达到了最后期限,我们将回去尝试优化事情。 我的问题是:优化 jQuery 网站时您要寻找的最重要的东西是什么。或者,我
所以我一直在用 JavaScript 编写游戏(不是网络游戏,而是使用 JavaScript 恰好是脚本语言的游戏引擎)。不幸的是,游戏引擎的 JavaScript 引擎是 SpiderMonkey
这是我在正在构建的页面中使用的 SQL 查询。它目前运行大约 8 秒并返回 12000 条记录,这是正确的,但我想知道您是否可以就如何使其更快提出可能的建议? SELECT DISTINCT Adve
如何优化这个? SELECT e.attr_id, e.sku, a.value FROM product_attr AS e, product_attr_text AS a WHERE e.attr
我正在使用这样的结构来测试是否按下了所需的键: def eventFilter(self, tableView, event): if event.type() == QtCore.QEven
我正在使用 JavaScript 从给定的球员列表中计算出羽毛球 double 比赛的所有组合。每个玩家都与其他人组队。 EG。如果我有以下球员a、b、c、d。它们的组合可以是: a & b V c
我似乎无法弄清楚如何让这个 JS 工作。 scroll function 起作用但不能隐藏。还有没有办法用更少的代码行来做到这一点?我希望 .down-arrow 在 50px 之后 fade out
我的问题是关于用于生产的高级优化级联样式表 (CSS) 文件。 多么最新和最完整(准备在实时元素中使用)的 css 优化器/最小化器,它们不仅提供删除空格和换行符,还提供高级功能,如删除过多的属性、合
我读过这个: 浏览器检索在 中请求的所有资源开始呈现 之前的 HTML 部分.如果您将请求放在 中section 而不是,那么页面呈现和下载资源可以并行发生。您应该从 移动尽可能多的资源请求。
我正在处理一些现有的 C++ 代码,这些代码看起来写得不好,而且调用频率很高。我想知道我是否应该花时间更改它,或者编译器是否已经在优化问题。 我正在使用 Visual Studio 2008。 这是一
我正在尝试使用 OpenGL 渲染 3 个四边形(1 个背景图,2 个 Sprite )。我有以下代码: void GLRenderer::onDrawObjects(long p_dt) {
我确实有以下声明: isEnabled = false; if(foo(arg) && isEnabled) { .... } public boolean foo(arg) { some re
(一)深入浅出理解索引结构 实际上,您可以把索引理解为一种特殊的目录。微软的SQL SERVER提供了两种索引:聚集索引(clustered index,也称聚类索引、簇集索引)和非聚集索引(no
一、写在前面 css的优化方案,之前没有提及,所以接下来进行总结一下。 二、具体优化方案 2.1、加载性能 1、css压缩:将写好的css进行打包,可以减少很多的体积。 2、css单一样式:在需要下边
我是一名优秀的程序员,十分优秀!