gpt4 book ai didi

excel - Excel 中的富文本格式(带有格式化标签)为无格式文本

转载 作者:行者123 更新时间:2023-12-02 13:38:58 29 4
gpt4 key购买 nike

我有大约。 Excel 中包含 RTF(包括格式标签)的 12000 个单元格。我需要解析它们以获得未格式化的文本。

这是带有文本的单元格之一的示例:

{\rtf1\ansi\deflang1060\ftnbj\uc1
{\fonttbl{\f0 \froman \fcharset0 Times New Roman;}{\f1 \fswiss \fcharset238
Arial;}}
{\colortbl ;\red255\green255\blue255 ;\red0\green0\blue0 ;}
{\stylesheet{\fs24\cf2\cb1 Normal;}{\cs1\cf2\cb1 Default Paragraph Font;}}
\paperw11908\paperh16833\margl1800\margr1800\margt1440\margb1440\headery720\footery720
\deftab720\formshade\aendnotes\aftnnrlc\pgbrdrhead\pgbrdrfoot
\sectd\pgwsxn11908\pghsxn16833\marglsxn1800\margrsxn1800\margtsxn1440\margbsxn1440
\headery720\footery720\sbkpage\pgncont\pgndec
\plain\plain\f1\fs24\pard TPR 0160 000\par IPR 0160 000\par OB-R-02-28\par}

我真正需要的是这个:

TPR 0160 000
IPR 0160 000
OB-R-02-28

简单循环单元格并删除不必要的格式的问题是,并不是这 12000 个单元格中的所有内容都像这样简单。所以我需要手动检查许多不同的版本并编写几个变体;但最终还是有很多手工工作要做。

但是,如果我将一个单元格的内容复制到空文本文档并将其另存为 RTF,然后用 MS Word 打开它,它会立即解析文本,我就得到了我想要的内容。不幸的是,对于 12000 个单元来说这样做非常不方便。

所以我正在考虑 VBA 宏,将单元格内容移动到 Word,强制解析,然后将结果复制回原始单元格。不幸的是我不太确定该怎么做。

有人有任何想法吗?或者有不同的方法?我将非常感谢您提供解决方案或插入正确的方向。

TNX!

最佳答案

如果您确实想使用 Word 解析文本,此函数应该可以帮助您。正如评论所建议的,您需要 MS Word 对象库的引用。

Function ParseRTF(strRTF As String) As String
Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'
Dim f As Integer 'Variable to store the file I/O number'

'File path for a temporary .rtf file'
Const strFileTemp = "C:\TempFile_ParseRTF.rtf"

'Obtain the next valid file I/O number'
f = FreeFile

'Open the temp file and save the RTF string in it'
Open strFileTemp For Output As #f
Print #f, strRTF
Close #f

'Open the .rtf file as a Word.Document'
Set wdDoc = GetObject(strFileTemp)

'Read the now parsed text from the Word.Document'
ParseRTF = wdDoc.Range.Text

'Delete the temporary .rtf file'
Kill strFileTemp

'Close the Word connection'
wdDoc.Close False
Set wdDoc = Nothing
End Function

您可以使用类似于以下的内容为 12,000 个单元中的每一个单元调用它:

Sub ParseAllRange()
Dim rngCell As Range
Dim strRTF As String

For Each rngCell In Range("A1:A12000")

'Parse the cell contents'
strRTF = ParseRTF(CStr(rngCell))

'Output to the cell one column over'
rngCell.Offset(0, 1) = strRTF
Next
End Sub

ParseRTF 函数运行大约需要一秒钟(至少在我的机器上),因此对于 12,000 个单元格,大约需要三个半小时。

<小时/>

在周末思考这个问题后,我确信有一个更好(更快)的解决方案。

我记得剪贴板的 RTF 功能,并意识到可以创建一个类,将 RTF 数据复制到剪贴板,粘贴到 Word 文档,并输出生成的纯文本。该解决方案的好处是不必为每个 rtf 字符串打开和关闭单词 doc 对象;它可以在循环之前打开并在循环之后关闭。

下面是实现此目的的代码。它是一个名为 clsRTFParser 的类模块。

Private Declare Function GlobalAlloc Lib "kernel32" _
(ByVal wFlags&, ByVal dwBytes As Long) As Long
Private Declare Function GlobalLock Lib "kernel32" _
(ByVal hMem As Long) As Long
Private Declare Function GlobalUnlock Lib "kernel32" _
(ByVal hMem As Long) As Long
Private Declare Function lstrcpy Lib "kernel32" _
(ByVal lpString1 As Any, ByVal lpString2 As Any) As Long

Private Declare Function OpenClipboard Lib "user32" _
(ByVal Hwnd As Long) As Long
Private Declare Function EmptyClipboard Lib "user32" () As Long
Private Declare Function RegisterClipboardFormat Lib "user32" Alias _
"RegisterClipboardFormatA" (ByVal lpString As String) As Long
Private Declare Function SetClipboardData Lib "user32" _
(ByVal wFormat As Long, ByVal hMem As Long) As Long
Private Declare Function CloseClipboard Lib "user32" () As Long

'---'

Dim wdDoc As Word.Document 'Ref: Microsoft Word 11.0 Object Library'

Private Sub Class_Initialize()
Set wdDoc = New Word.Document
End Sub

Private Sub Class_Terminate()
wdDoc.Close False
Set wdDoc = Nothing
End Sub

'---'

Private Function CopyRTF(strCopyString As String) As Boolean
Dim hGlobalMemory As Long
Dim lpGlobalMemory As Long
Dim hClipMemory As Long
Dim lngFormatRTF As Long

'Allocate and copy string to memory'
hGlobalMemory = GlobalAlloc(&H42, Len(strCopyString) + 1)
lpGlobalMemory = GlobalLock(hGlobalMemory)
lpGlobalMemory = lstrcpy(lpGlobalMemory, strCopyString)

'Unlock the memory and then copy to the clipboard'
If GlobalUnlock(hGlobalMemory) = 0 Then
If OpenClipboard(0&) <> 0 Then
Call EmptyClipboard

'Save the data as Rich Text Format'
lngFormatRTF = RegisterClipboardFormat("Rich Text Format")
hClipMemory = SetClipboardData(lngFormatRTF, hGlobalMemory)

CopyRTF = CBool(CloseClipboard)
End If
End If
End Function

'---'

Private Function PasteRTF() As String
Dim strOutput As String

'Paste the clipboard data to the wdDoc and read the plain text result'
wdDoc.Range.Paste
strOutput = wdDoc.Range.Text

'Get rid of the new lines at the beginning and end of the document'
strOutput = Left(strOutput, Len(strOutput) - 2)
strOutput = Right(strOutput, Len(strOutput) - 2)

PasteRTF = strOutput
End Function

'---'

Public Function ParseRTF(strRTF As String) As String
If CopyRTF(strRTF) Then
ParseRTF = PasteRTF
Else
ParseRTF = "Error in copying to clipboard"
End If
End Function

您可以使用类似于以下的内容为 12,000 个单元中的每一个单元调用它:

Sub CopyParseAllRange()
Dim rngCell As Range
Dim strRTF As String

'Create new instance of clsRTFParser'
Dim RTFParser As clsRTFParser
Set RTFParser = New clsRTFParser

For Each rngCell In Range("A1:A12000")

'Parse the cell contents'
strRTF = RTFParser.ParseRTF(CStr(rngCell))

'Output to the cell one column over'
rngCell.Offset(0, 1) = strRTF
Next
End Sub

我已经在我的机器上使用示例 RTF 字符串模拟了这一点。对于 12,000 个细胞,需要两分半钟,这是一个更合理的时间范围!

关于excel - Excel 中的富文本格式(带有格式化标签)为无格式文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1673025/

29 4 0