作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试提取 Word 文档中的所有单词。我可以按如下方式一次性完成所有工作......
Word.Application word = new Word.Application();
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();
foreach (Word.Range docRange in doc.Words) // loads all words in document
{
IEnumerable<string> sortedSubstrings = Enumerable.Range(0, docRange.Text.Trim().Length)
.Select(i => docRange.Text.Substring(i))
.OrderBy(s => s.Length < 3 ? s : s.Remove(2, Math.Min(s.Length - 2, 2)));
wordPosition =
(int)
docRange.get_Information(
Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber);
foreach (var substring in sortedSubstrings)
{
index = docRange.Text.IndexOf(substring) + wordPosition;
charLocation[index] = substring;
}
}
但是我更愿意一次加载一行文档...是否可以这样做?
我可以按段落加载它,但是我无法遍历段落以提取所有单词。
foreach (Word.Paragraph para in doc.Paragraphs)
{
foreach (Word.Range docRange in para) // Error: type Word.para is not enumeranle**
{
IEnumerable<string> sortedSubstrings = Enumerable.Range(0, docRange.Text.Trim().Length)
.Select(i => docRange.Text.Substring(i))
.OrderBy(s => s.Length < 3 ? s : s.Remove(2, Math.Min(s.Length - 2, 2)));
wordPosition =
(int)
docRange.get_Information(
Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber);
foreach (var substring in sortedSubstrings)
{
index = docRange.Text.IndexOf(substring) + wordPosition;
charLocation[index] = substring;
}
}
}
最佳答案
这有助于您逐行获取字符串。
object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";
Word.Application wordObject = new Word.ApplicationClass();
wordObject.Visible = false;
object nullobject = Missing.Value;
Word.Document docs = wordObject.Documents.Open
(ref file, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject);
String strLine;
bool bolEOF = false;
docs.Characters[1].Select();
int index = 0;
do
{
object unit = Word.WdUnits.wdLine;
object count = 1;
wordObject.Selection.MoveEnd(ref unit, ref count);
strLine = wordObject.Selection.Text;
richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding
object direction = Word.WdCollapseDirection.wdCollapseEnd;
wordObject.Selection.Collapse(ref direction);
if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
bolEOF = true;
} while (!bolEOF);
docs.Close(ref nullobject, ref nullobject, ref nullobject);
wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
docs = null;
wordObject = null;
Here是代码背后的天才。请点击链接以获取有关其工作原理的更多说明。
关于c# - 有没有办法逐行阅读word文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6924056/
我是一名优秀的程序员,十分优秀!