java - 使用默认方法解析字符串-6ren

java - 使用默认方法解析字符串

转载作者：行者123 更新时间：2023-12-01 14:28:55

25

4

我使用以下代码从 .odt 文件中提取文本:

public class OpenOfficeParser {

StringBuffer TextBuffer;

public OpenOfficeParser() {}

//Process text elements recursively
public void processElement(Object o) {

    if (o instanceof Element) {

        Element e = (Element) o;
        String elementName = e.getQualifiedName();

        if (elementName.startsWith("text")) {

            if (elementName.equals("text:tab")) // add tab for text:tab
                TextBuffer.append("\\t");
            else if (elementName.equals("text:s"))  // add space for text:s
                TextBuffer.append(" ");
            else {
                List children = e.getContent();
                Iterator iterator = children.iterator();

                while (iterator.hasNext()) {

                    Object child = iterator.next();
                    //If Child is a Text Node, then append the text
                    if (child instanceof Text) { 
                        Text t = (Text) child;
                        TextBuffer.append(t.getValue());
                    }
                    else
                    processElement(child); // Recursively process the child element                   
                }                   
            }
            if (elementName.equals("text:p"))
                TextBuffer.append("\\n");                   
        }
        else {
            List non_text_list = e.getContent();
            Iterator it = non_text_list.iterator();
            while (it.hasNext()) {
                Object non_text_child = it.next();
                processElement(non_text_child);                   
            }
        }               
    }
}

public String getText(String fileName) throws Exception {
    TextBuffer = new StringBuffer();

    //Unzip the openOffice Document
    ZipFile zipFile = new ZipFile(fileName);
    Enumeration entries = zipFile.entries();
    ZipEntry entry;

    while(entries.hasMoreElements()) {
        entry = (ZipEntry) entries.nextElement();

        if (entry.getName().equals("content.xml")) {

            TextBuffer = new StringBuffer();               
            SAXBuilder sax = new SAXBuilder();
            Document doc = sax.build(zipFile.getInputStream(entry));
            Element rootElement = doc.getRootElement();
            processElement(rootElement);
            break;
        }
    }    


 System.out.println("The text extracted from the OpenOffice document = " + TextBuffer.toString());
        return TextBuffer.toString();       
    }     
}

现在，当使用 getText() 方法返回的字符串时，出现问题。我运行该程序并从 .odt 中提取了一些文本，这是一段提取的文本:

(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....

所以我尝试了这个

System.out.println( TextBuffer.toString().split("\\n"));

我收到的输出是:

substring: [Ljava.lang.String;@505bb829

我也尝试过这个:

System.out.println( TextBuffer.toString().trim() );

但打印的字符串没有变化。

为什么会出现这种行为？我该怎么做才能正确解析该字符串？而且，如果我想将每个以“\n\n”结尾的子字符串添加到 array[i] 中，我该怎么办？

编辑:抱歉，我在示例中犯了一个错误，因为我忘记了 split() 返回一个数组。问题是它返回一个只有一行的数组，所以我要问的是为什么要这样做:

System.out.println(Arrays.toString(TextBuffer.toString().split("\\n")));

对我在示例中编写的字符串没有影响。

还有这个:

    System.out.println( TextBuffer.toString().trim() );

对原始字符串没有影响，它只是打印原始字符串。

我想举例说明为什么我要使用split()，这是因为我想解析该字符串并将每个以“\n”结尾的子字符串放入数组行中，这是一个例子:

我的原始字符串:

    (no hi virtual x oy)\n\n house cat \n open it \n\n trying to....

解析后，我将打印数组的每一行，输出应该是:

line 1: (no hi virtual x oy)\
line 2: house cat
line 3: open it
line 4: trying to
and so on.....

最佳答案

如果我正确理解你的问题，我会做这样的事情

String str = "(no hi virtual x oy)\n\n house cat \n open it \n\n trying to....";

List<String> al = new ArrayList<String>(Arrays.asList(str.toString()
            .split("\\n")));

al.removeAll(Arrays.asList("", null)); // remove empty or null string

for (int i = 0; i< al.size(); i++) {
    System.out.println("Line " + i + " : " + al.get(i).trim());
}

输出

Line 0 : (no hi virtual x oy)
Line 1 : house cat
Line 2 : open it
Line 3 : trying to....

关于java - 使用默认方法解析字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/16969071/

25

4

0

文章推荐： java - jackson 不会忽略未注释的字段

文章推荐： java - db2，如果时间戳早于 7 年，则获取前 20 条记录

Ruby 方法() 方法
我想了解 Ruby 方法 methods() 是如何工作的。我尝试使用“ruby 方法”在 Google 上搜索，但这不是我需要的。我也看过 ruby-doc.org，但我没有找到这种方法。
VBS教程：方法-Test 方法
Test 方法对指定的字符串执行一个正则表达式搜索，并返回一个 Boolean 值指示是否找到匹配的模式。 object.Test(string) 参数 object 必选项。总是一个
VBS教程：方法-Replace 方法
Replace 方法替换在正则表达式查找中找到的文本。 object.Replace(string1, string2) 参数 object 必选项。总是一个 RegExp 对象的名称。
VBS教程：方法-Raise 方法
Raise 方法生成运行时错误 object.Raise(number, source, description, helpfile, helpcontext) 参数 object 应为
VBS教程：方法-Execute 方法
Execute 方法对指定的字符串执行正则表达式搜索。 object.Execute(string) 参数 object 必选项。总是一个 RegExp 对象的名称。 string
VBS教程：方法-Clear 方法
Clear 方法清除 Err 对象的所有属性设置。 object.Clear object 应为 Err 对象的名称。说明在错误处理后，使用 Clear 显式地清除 Err 对象。此
VBS教程：方法-CopyFile 方法
CopyFile 方法将一个或多个文件从某位置复制到另一位置。 object.CopyFile source, destination[, overwrite] 参数 object 必选
VBS教程：方法-Copy 方法
Copy 方法将指定的文件或文件夹从某位置复制到另一位置。 object.Copy destination[, overwrite] 参数 object 必选项。应为 File 或 F
VBS教程：方法-Close 方法
Close 方法关闭打开的 TextStream 文件。 object.Close object 应为 TextStream 对象的名称。说明下面例子举例说明如何使用 Close 方
VBS教程：方法-BuildPath 方法
BuildPath 方法向现有路径后添加名称。 object.BuildPath(path, name) 参数 object 必选项。应为 FileSystemObject 对象的名称
VBS教程：方法-GetFolder 方法
GetFolder 方法返回与指定的路径中某文件夹相应的 Folder 对象。 object.GetFolder(folderspec) 参数 object 必选项。应为 FileSy
VBS教程：方法-GetFileName 方法
GetFileName 方法返回指定路径（不是指定驱动器路径部分）的最后一个文件或文件夹。 object.GetFileName(pathspec) 参数 object 必选项。应为
VBS教程：方法-GetFile 方法
GetFile 方法返回与指定路径中某文件相应的 File 对象。 object.GetFile(filespec) 参数 object 必选项。应为 FileSystemObject
VBS教程：方法-GetExtensionName 方法
GetExtensionName 方法返回字符串，该字符串包含路径最后一个组成部分的扩展名。 object.GetExtensionName(path) 参数 object 必选项。应
VBS教程：方法-GetDriveName 方法
GetDriveName 方法返回包含指定路径中驱动器名的字符串。 object.GetDriveName(path) 参数 object 必选项。应为 FileSystemObjec
VBS教程：方法-GetDrive 方法
GetDrive 方法返回与指定的路径中驱动器相对应的 Drive 对象。 object.GetDrive drivespec 参数 object 必选项。应为 FileSystemO
VBS教程：方法-GetBaseName 方法
GetBaseName 方法返回字符串，其中包含文件的基本名 (不带扩展名), 或者提供的路径说明中的文件夹。 object.GetBaseName(path) 参数 object 必
VBS教程：方法-GetAbsolutePathName 方法
GetAbsolutePathName 方法从提供的指定路径中返回完整且含义明确的路径。 object.GetAbsolutePathName(pathspec) 参数 object
VBS教程：方法-FolderExists 方法
FolderExists 方法如果指定的文件夹存在，则返回 True；否则返回 False。 object.FolderExists(folderspec) 参数 object 必选项
VBS教程：方法-FileExists 方法
FileExists 方法如果指定的文件存在返回 True；否则返回 False。 object.FileExists(filespec) 参数 object 必选项。应为 FileS

首页

博学

6Ren·AI

商城

java - 使用默认方法解析字符串