gpt4 book ai didi

java - 仅过滤大型 txt 文件中的可读字符

转载 作者:太空宇宙 更新时间:2023-11-04 13:29:12 25 4
gpt4 key购买 nike

我有一个很大的 .txt 文件,其中充满了可读和不可读的字符。我正在尝试创建一个 Java 程序,该程序可以创建一个新的 .txt 文件,其中仅包含先前 .txt 文件中的可读字符。请帮助我做到这一点。任何代码将不胜感激。我是 Java 新手。

最佳答案

如果“可读”表示从“a”到“z”以及从“1”到“9”的所有字符然后你可以使用正则表达式过滤掉它们,如下所示:

public static String removeSpecialCharacters(String sentence) {
//StringBuilder container to store all the data in
StringBuilder stringB = new StringBuilder();
//loop trough all the characters from the sentence
for (char c : sentence.toCharArray()) {
//only store characters that are equal to the below values
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == ' ' ) {
stringB.append(c);
}
}
return stringB.toString().toLowerCase();
}

您可以使用返回类型(字符串)追加到新的 .txt 容器中。比方说,对于您从旧文件中读取的每一行,通过removeSpecialCharacters() 方法循环并使用返回值并将其附加到新的 .txt 文件中。

如果我们读 standard java doc在读/写文件时,我们可以编译以下代码:

import static java.nio.file.StandardOpenOption.*;
import java.nio.file.*;
import java.io.*;


public class Main {

public static void main(String[] args) {

readFromFile();

}

private static void writeToFile(String line) {
// Convert the string to a
// byte array.
byte data[] = removeSpecialCharacters(line).getBytes();
Path p = Paths.get("/home/user/Desktop/outFile.txt");

try (OutputStream out = new BufferedOutputStream(Files.newOutputStream(p, CREATE, APPEND))) {
out.write(data, 0, data.length);
} catch (IOException x) {
System.err.println(x);
}
}

private static void readFromFile() {
Path file = Paths.get("/home/user/Desktop/inFile.txt");
try (InputStream in = Files.newInputStream(file);
BufferedReader reader =
new BufferedReader(new InputStreamReader(in))) {
String line = null;
while ((line = reader.readLine()) != null) {
writeToFile(line +"\n");
}
} catch (IOException x) {
System.err.println(x);
}
}

public static String removeSpecialCharacters(String sentence) {
//StringBuilder container to store all the data in
StringBuilder stringB = new StringBuilder();
//loop trough all the characters from the sentence
for (char c : sentence.toCharArray()) {
//only store characters that are equal to the below values
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == ' ' ) {
stringB.append(c);
}
}
return stringB.toString().toLowerCase();
}
}

关于java - 仅过滤大型 txt 文件中的可读字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32352823/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com