gpt4 book ai didi

vba - 计算包含大量文本的 Excel 列中最常用的单词?

转载 作者:行者123 更新时间:2023-12-02 13:30:54 26 4
gpt4 key购买 nike

我有一个大型电子表格,我想对特定列执行字数统计,以找出最常用的单词。该列包含大量数据和文本。

例如,“员工正在爬梯子,从最上面的货架上取回商品。梯子开始摇晃,员工失去平衡摔倒。右腿受伤”。类似这样的不同记录大约有1000条。我希望使用数据透视表来找出该列中所有单元格中最常用的单词。

我不知道该怎么做。任何人都可以协助如何做到这一点吗?

当前使用以下代码:

Option Explicit

Sub MakeWordList()
Dim InputSheet As Worksheet
Dim WordListSheet As Worksheet
Dim PuncChars As Variant, x As Variant
Dim i As Long, r As Long
Dim txt As String
Dim wordCnt As Long
Dim AllWords As Range
Dim PC As PivotCache
Dim PT As PivotTable

Application.ScreenUpdating = False
Set InputSheet = ActiveSheet
Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
WordListSheet.Range("A1") = "All Words"
WordListSheet.Range("A1").Font.Bold = True
InputSheet.Activate
wordCnt = 2
PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
"$", "%", "&", "(", ")", " - ", "_", "--", "+", _
"=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
r = 1

' Loop until blank cell is encountered
Do While Cells(r, 1) <> ""
' covert to UPPERCASE
txt = UCase(Cells(r, 1))
' Remove punctuation
For i = 0 To UBound(PuncChars)
txt = Replace(txt, PuncChars(i), "")
Next i
' Remove excess spaces
txt = WorksheetFunction.Trim(txt)
' Extract the words
x = Split(txt)
For i = 0 To UBound(x)
WordListSheet.Cells(wordCnt, 1) = x(i)
wordCnt = wordCnt + 1
Next i
r = r + 1
Loop

' Create pivot table
WordListSheet.Activate
Set AllWords = Range("A1").CurrentRegion
Set PC = ActiveWorkbook.PivotCaches.Add _
(SourceType:=xlDatabase, _
SourceData:=AllWords)
Set PT = PC.CreatePivotTable _
(TableDestination:=Range("C1"), _
TableName:="PivotTable1")
With PT
.AddDataField .PivotFields("All Words")
.PivotFields("All Words").Orientation = xlRowField
End With
End Sub

最佳答案

这是一个快速而肮脏的宏(我今天感觉特别有帮助)。将其放入您的工作簿模块中。注意:我假设您将激活的工作表是 A 列中包含所有文本的工作表。

Sub Test()
Dim lastRow&, i&, tempLastRow&
Dim rawWS As Worksheet, tempWS As Worksheet

Set rawWS = ActiveSheet
Set tempWS = Sheets.Add
tempWS.Name = "Temp"
rawWS.Activate

'tempWS.Columns(1).Value = rawWS.Columns(1).Value
tempLastRow = 1

With rawWS
.Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
Semicolon:=False, Comma:=False, Space:=True, Other:=False, TrailingMinusNumbers:=True

lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row

For i = lastRow To 1 Step -1
.Rows(i).EntireRow.Copy
tempWS.Range("A" & tempLastRow).PasteSpecial Paste:=xlPasteAll, Transpose:=True
' tempWS.Range ("A" & tempLastRow)
tempLastRow = tempWS.Cells(tempWS.Rows.Count, 1).End(xlUp).Row + 1
Next i
Application.CutCopyMode = False
End With

With tempWS
' Now, let's get unique words and run a count
.Range("A:A").Copy .Range("C:C")
.Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
tempLastRow = .Cells(.Rows.Count, 3).End(xlUp).Row

.Range(.Cells(1, 4), .Cells(tempLastRow, 4)).FormulaR1C1 = "=COUNTIF(C[-3],RC[-1])"
.Sort.SortFields.Clear
.Sort.SortFields.Add Key:=Range("D1:D1048576") _
, SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
With .Sort
.SetRange Range("C1:D1048576")
.Header = xlGuess
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With

End With

End Sub

基本上,它创建一个新工作表,计算所有单个单词的数量,并将单词(和计数)放入一列中,按最常用的顺序排序。您可以根据需要进行调整。

注意:我在你添加代码之前就做了这个。它不会创建数据透视表,但据我了解您的需要,如果您只需要最常用的单词,数据透视表就太过分了。但是,如果您需要任何编辑或更改,请告诉我!

关于vba - 计算包含大量文本的 Excel 列中最常用的单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33902129/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com