gpt4 book ai didi

java - 使用Python解决方案无法在JAVA中处理,因为执行速度慢

转载 作者:行者123 更新时间:2023-12-01 15:04:14 25 4
gpt4 key购买 nike

我有 300 万行数据,每行数据有 30 个特征 - 很难将所有数据包含在我的计算机内存中,并且使用学习算法处理它的速度很慢 - 。我想编写一些进行随机采样的代码,但在 JAVA 和我的 PC 配置中,它不起作用或需要很长时间才能执行。我知道用 C 或 C++ 编写可以提供更好的解决方案,但我也很好奇 python 对于这种情况的可用性。在 Java 由于缓慢和内存限制而无法有效工作的情况下使用 Python 是否合理 - 请不要说增加堆大小或类似的 - ?

最佳答案

如果性能至关重要,这就是我使用的解决方案。

public class SimpleTable {
private final List<RandomAccessFile> files = new ArrayList<RandomAccessFile>();
private final List<FloatBuffer> buffers = new ArrayList<FloatBuffer>();
private final File baseDir;
private final int rows;

private SimpleTable(File baseDir, int rows) {
this.baseDir = baseDir;
this.rows = rows;
}

public static SimpleTable create(String baseName, int rows) throws IOException {
File baseDir = new File(baseName);
if (!baseDir.mkdirs()) throw new IOException("Failed to create " + baseName);
PrintWriter pw = new PrintWriter(baseName + "/rows");
pw.println(rows);
pw.close();
return new SimpleTable(baseDir, rows);
}

public static SimpleTable load(String baseName) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(baseName + "/rows"));
int rows = Integer.parseInt(br.readLine());
br.close();
File baseDir = new File(baseName);
SimpleTable table = new SimpleTable(baseDir, rows);
File[] files = baseDir.listFiles();
Arrays.sort(files);
for (File file : files) {
if (!file.getName().endsWith(".float")) continue;
table.addColumnForFile(file);
}
return table;
}

private FloatBuffer addColumnForFile(File file) throws IOException {
RandomAccessFile rw = new RandomAccessFile(file, "rw");
MappedByteBuffer mbb = rw.getChannel().map(FileChannel.MapMode.READ_WRITE, 0, rows * 8);
mbb.order(ByteOrder.nativeOrder());
FloatBuffer db = mbb.asFloatBuffer();
files.add(rw);
buffers.add(db);
return db;
}

public int rows() {
return rows;
}

public int columns() {
return buffers.size();
}

public FloatBuffer addColumn() throws IOException {
return addColumnForFile(new File(baseDir, String.format("%04d.float", buffers.size())));
}

public FloatBuffer getColumn(int n) {
return buffers.get(n);
}

public void close() throws IOException {
for (RandomAccessFile file : files) {
file.close();
}
files.clear();
buffers.clear();
}
}

public class SimpleTableTestMain {
public static void main(String... args) throws IOException {
long start = System.nanoTime();
SimpleTable st = SimpleTable.create("test", 3 * 1000 * 1000);
for (int i = 0; i < 50; i++) {
FloatBuffer db = st.addColumn();
for (int j = 0; j < db.capacity(); j++)
db.put(j, i + j);
}
st.close();

long mid = System.nanoTime();

SimpleTable st2 = SimpleTable.load("test");
for (int i = 0; i < 50; i++) {
FloatBuffer db = st2.getColumn(i);
double sum = 0;
for (int j = 0; j < db.capacity(); j++)
sum += db.get(j);
assert sum > 0;
}

long end = System.nanoTime();
System.out.printf("Took %.3f seconds to write and %.3f seconds to read %,d rows and %,d columns%n",
(mid - start) / 1e9, (end - mid) / 1e9, st2.rows(), st2.columns());
st2.close();
}
}

打印

Took 2.070 seconds to write and 2.206 seconds to read 3,000,000 rows and 50 columns

关于java - 使用Python解决方案无法在JAVA中处理,因为执行速度慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13193871/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com