gpt4 book ai didi

java - 如何跳过已处理的文件

转载 作者:行者123 更新时间:2023-12-01 09:21:16 25 4
gpt4 key购买 nike

我有一个应用程序,它遍历一个充满文件的文件夹并从中提取文本。我希望应用程序记录它已处理的文件,然后当程序重新运行时,跳过已从中提取文本的同一文件夹中的这些文件。目前,我能够记录已处理的文件,但是当我重新运行程序时,文件将被重新处理,这会减慢一切。下面有什么问题,有没有更有效的方法?

public class Iterator {
static HashSet<String> myFiles = new HashSet<String>();
public static Preferences prefs;
static String filename= "/Files/FilesLogged.txt";
static String folderName;
static Path p;
public Iterator() {
}

public static void main(String[] args) throws IOException, SAXException, TikaException, SQLException, ParseException, URISyntaxException, BackingStoreException {
Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);

BufferedReader reader = new BufferedReader(new InputStreamReader(ClassLoader.class.getResourceAsStream(filename)),2048);
String line = null;
//Reading the files from the logger so they can be avoided
while((line = reader.readLine()) != null) {
myFiles.add(line);
}


//This iterates through each of the files in the specified folder and copies them to a log.
//It also checks to see if that file has been read already so that it isn't re-inputted into the database if run again
//Loop through the ArrayList with the full path names of each folder in the outer loop

String[] keys = userPrefs.keys();
for (String folderName : keys) {
//Extract the folder name from the Prefs and iterate through
if(userPrefs.get(folderName, null)!=null){
loopthrough(userPrefs.get(folderName, null));
}
}
reader.close();
}





public static void loopthrough(String folderName) throws IOException, SAXException, TikaException, SQLException, ParseException, URISyntaxException{

File dir = new File(folderName);
File[] directoryListing = dir.listFiles();
if (directoryListing != null) {
for (File child : directoryListing) {

if(!myFiles.contains(child.getName())){
Preferences userPrefs = Preferences.userNodeForPackage(TBB_SQLBuilder.class);
FileWriter fw= new FileWriter(userPrefs.get("PathForLogger", null),true);

BufferedWriter bw = new BufferedWriter(fw,2048);
bw.write(child.getName().toString().trim());
bw.newLine();
bw.flush();
bw.close();
fw.close();

}
}
}
}

}

最佳答案

通常在处理文件时,您会执行以下操作:当您开始处理时,您要做的第一件事是将文件移动到 ..inprocess 或类似的目录,或者将其移动到 inprocess 目录。完成处理后,将名称更改为 ..done 或类似名称,或将其移动到完成目录。这样,当您查找要处理的文件时,您可以避免正在处理和已完成的文件。它还可以轻松查看需要重新处理的内容

关于java - 如何跳过已处理的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40151999/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com