gpt4 book ai didi

java - 读取CSV文件java的最快方法

转载 作者:塔克拉玛干 更新时间:2023-11-01 21:32:07 25 4
gpt4 key购买 nike

我一直在尝试使用 openCSV 读取几个 csv 文件(大约 20 MB),但到目前为止速度很慢。我试图读取 4 个 csv 文件,我将它们加载到我设计的堆中。我想知道,是否有任何其他方法可以在更短的时间内完成。

private Heap<VOMovingViolations> datosHeap; 

public void loadMovingViolations()
{
Runtime garbage = Runtime.getRuntime();
garbage.gc();
try
{
FileReader fileReaderMes1 = new FileReader(FECHAS[0]);
FileReader fileReaderMes2 = new FileReader(FECHAS[1]);
FileReader fileReaderMes3 = new FileReader(FECHAS[2]);
FileReader fileReaderMes4 = new FileReader(FECHAS[3]);
CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();
CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();
CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();
CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();

String[] row;


while((row = enero.readNext()) != null)
{
int objectId = Integer.parseInt(row[0]);
int totalPaid = (int)Double.parseDouble(row[9]);
short fi = Short.parseShort(row[8]);
short penalty1 = Short.parseShort(row[10]);
datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],
row[12],row[14], row[15], row[4], row[3], penalty1));
}

while((row = febrero.readNext()) != null)
{
int objectId = Integer.parseInt(row[0]);
int totalPaid = (int)Double.parseDouble(row[9]);
short fi = Short.parseShort(row[8]);
short penalty1 = Short.parseShort(row[10]);
datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],
row[12],row[14], row[15], row[4], row[3], penalty1));
}

while((row = marzo.readNext()) != null)
{
int objectId = Integer.parseInt(row[0]);
int totalPaid = (int)Double.parseDouble(row[9]);
short fi = Short.parseShort(row[8]);
short penalty1 = Short.parseShort(row[10]);
datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],
row[12],row[14], row[15], row[4], row[3], penalty1));
}

while((row = abril.readNext()) != null)
{
int objectId = Integer.parseInt(row[0]);
int totalPaid = (int)Double.parseDouble(row[9]);
short fi = Short.parseShort(row[8]);
short penalty1 = Short.parseShort(row[10]);
datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],
row[12],row[14], row[15], row[4], row[3], penalty1));
}

}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}


}

如果有人能给我任何帮助或任何想法,我将不胜感激。

最佳答案

tl;dr

读取一个 20 MB 的 CSV 文件,并在每行中实例化一个对象,总耗时不到 1 秒

详情

您没有定义“慢”这个词。所以我做了一个实验,一个随意的基准测试。

首先,我们创建一个 20 MB 的文件,其中包含 40,000 条 Person 记录。每个 Person 都有一个法语名字和姓氏,一个 UUID ,以及一些任意文本作为描述。数据被写为 CSV 中的四列。文件在 UTF-8 .我用了Apache Commons CSV编写和阅读的图书馆。

其次,读取这个写入的文件。每行数据被读入内存,然后用于实例化和收集一个Person对象。

读取此文件并为每一行实例化 Person 对象花费了不到一秒的总耗时。每行大约需要 20K nanoseconds .实际上,这包括读取文件两次,因为我们进行扫描以计算数据行数以设置所收集实例的初始容量。此外,我们正在将十六进制字符串输入解析为 UUID 的 128 位值。 ,所以我们有一些时间花在数据处理上(而不仅仅是阅读)。

对于 Java 16+,将 Person 类定义为 record .我们覆盖 toString以避免打印出冗长的description内容。

record Person ( String givenName , String surname , UUID id , String description ) 
{
static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

@Override
public String toString ()
{
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" | id='" + id + '\'' +
" }";
}
}

对于早期的 Java,编写一个常规的 Person 类。

package work.basil.example;

import java.util.UUID;

public class Person
{
// Static
static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

// Member variables.
public String givenName, surname, description;
public UUID id;

public Person ( String givenName , String surname , UUID id , String description )
{
this.givenName = givenName;
this.surname = surname;
this.id = id;
this.description = description ;
}

@Override
public String toString ()
{
return "Person{ " +
"givenName='" + givenName + '\'' +
" | surname='" + surname + '\'' +
" | id='" + id + '\'' +
" }";
}
}

这是完整的应用程序,它写入然后读取 20 MB 的文件。请学习和批评,因为我很快就把它掀起来了。我没有仔细检查我的工作。

您会发现一个write 方法和一个read 方法。 main 方法调用两者并跟踪时间。

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed
{
public List < Person > read ( Path path )
{
// TODO: Add a check for valid file existing.

List < Person > list = List.of(); // Default to empty list.
try
{
// Prepare list.
int initialCapacity = ( int ) Files.lines( path ).count();
list = new ArrayList <>( initialCapacity );

// Read CSV file. For each row, instantiate and collect `DailyProduct`.
BufferedReader reader = Files.newBufferedReader( path );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
for ( CSVRecord record : records )
{
String givenName = record.get( "givenName" );
String surname = record.get( "surname" );
UUID id = UUID.fromString( record.get( "id" ) );
String description = record.get( "description" );
// Instantiate `Person` object, and collect it.
Person person = new Person( givenName , surname , id , description );
list.add( person );
}
} catch ( IOException e )
{
e.printStackTrace();
}
return list;
}

public void write ( final Path path )
{
ThreadLocalRandom random = ThreadLocalRandom.current();
try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )
{
int limit = 40_000; // 40_000 yields about 20 MB of data.
List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );
List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );
for ( int i = 1 ; i <= limit ; i++ )
{
String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );
String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );
UUID id = UUID.randomUUID();
String description = Person.LOREM_IPSUM;
printer.printRecord( givenName , surname , id , description );
}
} catch ( IOException e )
{
e.printStackTrace();
}
}

public static void main ( final String[] args )
{
// Launch the app.
CsvSpeed app = new CsvSpeed();

// Write.
String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );
app.write( pathOutput );
System.out.println( "Writing file: " + pathOutput );

// Read.
long start = System.nanoTime();
Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );
List < Person > list = app.read( pathInput );
long stop = System.nanoTime();

// Time.
long elapsed = ( stop - start );
Duration d = Duration.ofNanos( elapsed );
System.out.println( "Reading elapsed: " + d );
System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );
System.out.println( "nanos elapsed: " + elapsed + " | list.size: " + list.size() );
}
}

运行时:

Writing file: /Users/basilbourque/persons.csv

Reading elapsed: PT0.857816234S

Reading took nanos per row: 21445

nanos elapsed: 857816234 | list.size: 40000

技术栈:

  • Java 11.0.2 — Zulu 由 Azul Systems(基于 OpenJDK 构建)
  • 在 IntelliJ 2019.1 中运行
  • macOS Mojave
  • MacBook Pro(视网膜显示屏,15 英寸,2013 年末)
  • 处理器:2.3 GHz Intel Core i7(4 核,8 hyper)
  • 16 GB 1600 MHz DDR3
  • 存储:Apple 内置固态

关于java - 读取CSV文件java的最快方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55084846/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com