java - 在内存中存储大 map-6ren

java - 在内存中存储大 map

转载作者：塔克拉玛干更新时间：2023-11-01 23:06:33

首先是问题的背景:我有一个非常大的图表，存储成本约为 4GB。大约 3M 个节点和 34M 个边。我的程序采用这个大图并从中递归构建较小的图。在递归的每个级别，我都有两个图 - 原始图和从原始图创建的图。递归一直持续到图被缩减为非常小的图，比如大约 10 个节点。

由于我在整个程序执行过程中都需要这些图表，因此内存效率对我的应用程序至关重要。

这是我目前遇到的问题:这是从大图创建小图的算法:

public static Graph buildByTriples(Graph g, ArrayList<Integer> seeds) {
    ArrayList<Edge> edges = new ArrayList(g.getEdgeCount());
    for (int i = 0; i < g.size(); i++) {
        for (Edge e : g.adj(i)) {
            int v = e.getEndpoint(i);
            if (i < v) {
                edges.add(e);
            }
        }
    }

    Table<Integer, Integer, Double> coarseEgdes = HashBasedTable.create(seeds.size(),seeds.size());
    //compute coarse weights
    edges.stream().forEach((e) -> {
        int v = e.getV();
        int u = e.getU();
        if (g.isC(u) && g.isC(v)) {
            addToTable(coarseEgdes, u, v, e.getWeight());
        }else if(!g.isC(u) && g.isC(v)){ //F-C
            for(Edge cEdge: g.cAdj(u)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(u);
                if(nb != v){
                    addToTable(coarseEgdes, v, nb, cEdge.getPij() * e.getWeight());

                }
            }
        }else if(g.isC(u) && !g.isC(v)){//C-F
            for(Edge cEdge: g.cAdj(v)){//get coarse neighbors of the fine edges
                int nb = cEdge.getEndpoint(v);
                if(nb != u){
                    addToTable(coarseEgdes, u, nb, cEdge.getPij() * e.getWeight());
                }
            }
        }else{//F-F
            for(Edge cEdgeU: g.cAdj(u)){//get coarse neighbors of the fine edges
                int uNb = cEdgeU.getEndpoint(u);
                for(Edge cEdgeV: g.cAdj(v)){
                    int vNb = cEdgeV.getEndpoint(v);
                    if(uNb != vNb){
                        addToTable(coarseEgdes, uNb, vNb, cEdgeU.getPij() * e.getWeight() * cEdgeV.getPij());
                    }
                }
            }
        }
    });

    return createGraph(g, coarseEgdes); //use the edges to build new graph. Basically loops through coarseEdges and add edge and weight to the new graph.
}

private static void addToTable(Table<Integer, Integer,Double> tbl, int r, int c, double val){
    int mn = Math.min(r, c);//the smaller of the two nodeIds
    int mx = Math.min(r, c);//the largest of the two nodeId
    if(tbl.contains(mn, mx)){
        tbl.put(mn, mx, tbl.get(mn, mx) + val);
    }else{
        tbl.put(mn, mx,val);
    }
}

现在，当我这样做时，我很快就会耗尽内存。我用 YourKit 分析了应用程序，并且内存使用率超过了屋顶(在用完之前超过 6GB)，因此 CPU 使用率也是如此。 coarseEdges 可以变得非常大。是否有更好的内存中 Map 实现可以扩展到大型数据集？或者有没有更好的方法在不存储 coarseEdges 的情况下执行此操作？

PS:请注意，我的图表无法在恒定时间内检索边(u，v)。它基本上是一个列表列表，这可以更好地为我的应用程序的其他关键部分提供性能。

**Also See my graph implementation code below: **
public class Graph{
    private final int SIZE;
    private final EdgeList[] nodes;
    private final float[] volumes;
    private final double[] weightedSum;
    private final double[] weightedCoarseSum;
    private final int[] nodeDegrees;
    private final int[] c_nodeDegrees;
    private int edge_count=0;
    private final boolean[] coarse;
    private final EdgeList[] coarse_neighbors;
    public Graph(int SIZE){
        this.SIZE =SIZE;
        nodes = new EdgeList[SIZE];
        coarse_neighbors = new EdgeList[SIZE];

        volumes = new float[SIZE];
        coarse = new boolean[SIZE];

        //initialize data
        weightedSum = new double[SIZE];
        weightedCoarseSum = new double[SIZE];
        nodeDegrees= new int[SIZE];
        c_nodeDegrees = new int[SIZE];

        for(int i=0;i<SIZE;i++){
            nodes[i]=new EdgeList();
            coarse_neighbors[i] = new EdgeList();
            volumes[i]=1;
        }
    }

    public void addEdge(int u, int v, double w){
        //graph is undirected
        //In order to traverse edges in order such that u < v. We store edge u,v such that u<v
        Edge e=null;
        if(u<v){
            e = new Edge(u,v,w);
        }else if(u>v){
            e = new Edge(v,u,w);
        }else{
            throw new UnsupportedOperationException("Self loops not allowed in graph"); //TODO: Need a graph validation routine
        }

        nodes[u].add(e);
        nodes[v].add(e);

        //update the weighted sum of each edge
        weightedSum[u] += w;
        weightedSum[v] += w;

        //update the degree of each edge
        ++nodeDegrees[u];
        ++nodeDegrees[v];

        ++edge_count;
    }

    public int size(){
        return SIZE;
    }

    public EdgeList adj(int v){
        return nodes[v];
    }

    public EdgeList cAdj(int v){
        return coarse_neighbors[v];
    }

    public void sortAdj(int u, Comparator<Edge> c){
        nodes[u].sort(c);
    }

    public void sortCoarseAdj(int u, Comparator<Edge> c){
        coarse_neighbors[u].sort(c);
    }

    public void setCoarse(int node, boolean c){
        coarse[node] = c;
        if(c){
            //update the neighborHood of node
            for(Edge e: adj(node)){
                int v = e.getEndpoint(node);
                coarse_neighbors[v].add(e);
                weightedCoarseSum[v] += e.getWeight();
                ++c_nodeDegrees[v];
            }
        }
    }

    public int getEdgeCount(){
        return edge_count;
    }

    public boolean isC(int id){
        return coarse[id];
    }

    public double weightedDegree(int node){
        return weightedSum[node];
    }

    public double weightedCoarseDegree(int node){
        return weightedCoarseSum[node];
    }

    public int degree(int u){
        return nodeDegrees[u];
    }

    public int cDegree(int u){
        return c_nodeDegrees[u];
    }

    public Edge getCNeighborAt(int u,int idx){
        return coarse_neighbors[u].getAt(idx);
    }

    public float volume(int u){
        return volumes[u];
    }

    public void setVolume(int node, float v){
        volumes[node] = v;
    }

    @Override
    public String toString() {
        return "Graph[nodes:"+SIZE+",edges:"+edge_count+"]";
    }

}


//Edges are first class objects.
public class Edge {
    private boolean deleted=false;
    private int u;
    private int v;
    private double weight;
    private double pij;
    private double algebraicDist = (1/Constants.EPSILON);

    public Edge(int u, int v, double weight) {
        this.u = u;
        this.v = v;
        this.weight = weight;
    }

    public Edge() {
    }

    public int getU() {
        return u;
    }

    public void setU(int u) {
        this.u = u;
    }

    public int getV() {
        return v;
    }

    public void setV(int v) {
        this.v = v;
    }

    public int getEndpoint(int from){
        if(from == v){
            return u;
        }

        return v;
    }

    public double getPij() {
        return pij;
    }

    public void setPij(double pij) {
        this.pij = pij;
    }

    public double getAlgebraicDist() {
        return algebraicDist;
    }

    public void setAlgebraicDist(double algebraicDist) {
        this.algebraicDist = algebraicDist;
    }

    public boolean isDeleted() {
        return deleted;
    }

    public void setDeleted(boolean deleted) {
        this.deleted = deleted;
    }

    public double getWeight() {
        return weight;
    }

    public void setWeight(double weight) {
        this.weight = weight;
    }

    @Override
    public String toString() {
        return "Edge[u:"+u+", v:"+v+"]";
    }
}


// The Edge iterable
public class EdgeList implements Iterable<Edge>{
    private final ArrayList<Edge> data= new ArrayList();

    public void add(Edge e){
        data.add(e);
    }

    @Override
    public Iterator<Edge> iterator() {
        Iterator<Edge> it = new IteratorImpl();
        return it;
    }

    private class IteratorImpl implements Iterator<Edge> {

        public IteratorImpl() {
        }
        private int currentIndex = 0;
        private final int N = data.size();
        @Override
        public boolean hasNext() {

            //skip deleted
            while(currentIndex < N && data.get(currentIndex).isDeleted()){
                currentIndex++;
            }

            return currentIndex < N;
        }

        @Override
        public Edge next() {
            return data.get(currentIndex++);
        }

        @Override
        public void remove() {
            throw new UnsupportedOperationException();
        }
    }

    public Edge getAt(int idx){
        return data.get(idx);
    }

    public void sort(Comparator<Edge> c){
        data.sort(c);
    }
}

最佳答案

这里有一些盲点 - 您需要实现它们才能看到它有多大帮助。

1) 您可能会考虑将组合键 (int,int) 与 hashmap 一起使用，而不是 guava 表。对于边缘权重来说肯定会更有效。如果您需要查询从某个顶点传出的边，那么它就不那么明显了，但是您需要查看 cpu 与内存的权衡。

2) 如果你使用普通的 hashmap，你可以考虑使用一种堆外实现。看看https://github.com/OpenHFT/Chronicle-Map例如，它可能

3) 如果你留在内存中，想挤出一些额外的空间，你可以用原始 map 做一些肮脏的把戏。使用 long->double 映射，例如 http://labs.carrotsearch.com/download/hppc/0.4.1/api/com/carrotsearch/hppc/LongDoubleMap.html或 http://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/TLongDoubleHashMap.html ，将您的 2xint 顶点对编码为 long 并查看它有多大帮助。如果您使用的是 64 位，Integer 可以占用 16 个字节(假设压缩 oops)，Double 24 个字节 - 每个条目提供 32+24=56 个字节，而原始映射为 8+8 个字节

关于java - 在内存中存储大 map ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38920824/

文章推荐： java - 自动更新 boolean 值

c# - Azure 存储(经典)与 Azure 存储 (V2) 代码不适用于 V2 存储
我正在运行一个辅助角色，并检查 Azure 上托管的存储中是否存在数据。当我将连接字符串用于经典类型的存储时，我的代码可以正常工作，但是当我连接到 V2 Azure 存储时，它会抛出此异常。 “远程服
javascript - HTML5 Web 存储 - 存储 JSON 数据然后检索它
在我的应用程序的主页上，我正在进行 AJAX 调用以获取应用程序各个部分所需的大量数据。该调用如下所示: var url = "/Taxonomy/GetTaxonomyList/" $.getJSO
vue.js - "export ' 存储 ' was not found in ' ../存储'
大家好，我正在尝试将我的商店导入我的 Vuex Route-Gard。路由器/auth-guard.js import {store} from '../store' export default
c# - Azure Blob 存储 - 上传 Blob 后如何获取 Blob 存储 ID？
我正在使用 C# 控制台应用程序 (.NET Core 3.1) 从 Azure Blob 存储读取大量图像文件并生成这些图像的缩略图。新图像将保存回 Azure，并将 Blob ID 存储在我们的数
python - 设置 Mlflow 后端 (SQLite) 和工件(Azure Blob 存储)存储
我想将 Mlflow 设置为具有以下组件: 后端存储(本地):在本地使用 SQLite 数据库存储 Mlflow 实体(run_id、params、metrics...) 工件存储(远程):使用 Az
c# - Azure Blob 存储 - 上传 Blob 后如何获取 Blob 存储 ID？
我正在使用 C# 控制台应用程序 (.NET Core 3.1) 从 Azure Blob 存储读取大量图像文件并生成这些图像的缩略图。新图像将保存回 Azure，并将 Blob ID 存储在我们的数
python - 设置 Mlflow 后端 (SQLite) 和工件(Azure Blob 存储)存储
我想将 Mlflow 设置为具有以下组件: 后端存储(本地):在本地使用 SQLite 数据库存储 Mlflow 实体(run_id、params、metrics...) 工件存储(远程):使用 Az
python - 使用适用于 Python 的 Azure 存储 SDK 将多个文件从文件夹上传到 Azure Blob 存储
我的 Windows 计算机上的本地文件夹中有一些图像。我想将所有图像上传到同一容器中的同一 blob。我知道如何使用 Azure Storage SDKs 上传单个文件BlockBlobServi
javascript - 向 Azure Blob 存储 [REST API][Azure Blob 存储] 发出 GET 请求时授权失败
我尝试发出 GET 请求来获取我的 Azure Blob 存储帐户的帐户详细信息，但每次都显示身份验证失败。谁能判断形成的 header 或签名字符串是否正确或是否存在其他问题？代码如下: cons
javascript - NeutralinoJS 存储
这是用于编写 JSON 的 NeutralinoJS 存储 API。是否可以更新 JSON 文件(推送数据)，而不仅仅是用新的 JS 对象覆盖数据。怎么做到的？？？ // Javascript
jenkins - 在调用并行阶段之前运行脚本(存储)
我有一个并行阶段设置，想知道是否可以在嵌套阶段之前运行脚本，所以像这样: stage('E2E-PR-CYPRESS') { when { allOf {
virtualbox - VBoxManage列出虚拟机详细信息(存储)
我想从命令行而不是从GUI列出VirtualBox VM的详细信息。我对存储细节特别感兴趣。当我在GUI中单击VM时，可以看到包括存储部分在内的详细信息: 但是到目前为止，我还没有找到通过命令行执行
rdbms - 存储/访问有向图的最佳方式
我有大约 3500 个防洪设施，我想将它们表示为一个网络来确定流动路径(本质上是一个有向图)。我目前正在使用 SqlServer 和 CTE 来递归检查所有节点及其上游组件，只要上游路径没有 fork
Jquery data() 存储
谁能告诉我 jquery data() 在哪里存储数据以及何时删除以及如何删除？如果我用它来存储ajax调用结果，会有性能问题吗？例如: $("body").data("test", { myDa
Firebase 存储 - 如何设置备份
有人可以建议如何为 Firebase 存储中的文件设置备份。我能够备份数据库，但不确定如何为 firebase 存储中的文件(我有图像)设置定期备份。最佳答案如何进行 Firebase 存储的本地
Firebase 存储 - 图像预览正在永久加载
我最近开始使用 firebase 存储和 firebase 功能。现在我一直在开发从功能到存储的文件上传。我已经让它工作了(上传完成并且文件出现在存储部分)，但是，图像永远保持这样(永远在右侧加载)
Firebase 存储 – 不能删除大小要求
我想只允许用户将文件上传到他们自己的存储桶中，最大文件大小为 1MB，仍然允许他们删除文件。我添加了以下内容: match /myusers/{userId}/{allPaths=**} { al
Azure 存储 - 数据湖生命周期管理问题
使用生命周期管理策略将容器的内容从冷访问层移动到存档。我正在尝试以下策略，希望它能在一天后将该容器中的所有文件移动到存档层，但事实并非如此在职的。我设置了选择标准“一天未使用后”。这是 json 代
Azure 存储，在安全性和速度之间进行选择
对于连接到 Azure 存储端点，有 http 和 https 两个选项。第一。 https 会带来开销，可能是 5%-10%，但我不支付同一个数据中心的费用。第二。 http 更快，但 Auth
Azure 存储 vhd
有人可以帮我理解这一点吗？我创建了Virtual Machine in Azure running Windows Server 2012 。我注意到 Azure 自动创建了一个存储帐户。当我进入该存

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 在内存中存储大 map