gpt4 book ai didi

java - 用Java计算Jaccard相似度

转载 作者:行者123 更新时间:2023-11-30 06:46:19 27 4
gpt4 key购买 nike

我遵循循环遍历数组列表(mainItems)的代码并找到最相似的两个数组并将它们放入sortedTransactions中。它对于小数据(10000 个事务)工作正常,但对于 88000 个事务它会永远运行。可以采取哪些措施使其适用于大数据。

import java.util.*;

public class Sort {

static private List<Transactions> trans = ReadFile.transactions;
static public List<int[]> mainItems;
static public ArrayList<int[]> sortedTransactions = new ArrayList<int[]>();

static {
mainItems = new ArrayList<int[]>();

for (Transactions t : trans) {
mainItems.add(t.getItems());
}
}

static private double jaccardSimilarity(int[] a, int[] b) {

Set<Integer> s1 = new LinkedHashSet<Integer>();
for(int i =0; i< a.length; i++){
s1.add(a[i]);
}
Set<Integer> s2 = new LinkedHashSet<Integer>();
for(int i =0; i< b.length; i++){
s2.add(b[i]);
}

Set<Integer> intersection = new LinkedHashSet<>(s1);
intersection.retainAll(s2);

Set<Integer> union = new LinkedHashSet<Integer>(s1);
union.addAll(s2);

double jaccardSimilarity = (double)intersection.size()/ (double)union.size();
//System.out.println(intersection);
return jaccardSimilarity;
}

static private boolean isAllEqual(List<Double> a){

for(int i=1; i<a.size(); i++){
if(a.get(0) != a.get(i)){
return false;
}
}

return true;
}


static public void generatePairs() {

for (int i = 0; i < mainItems.size() - 1; i++) {

if (!sortedTransactions.contains(mainItems.get(i))) {

List<Double> myd = new ArrayList<Double>();
List<int[]> mys = new ArrayList<int[]>();

for (int j = i + 1; j < mainItems.size(); j++) {

if (!sortedTransactions.contains(mainItems.get(j))) {

myd.add(jaccardSimilarity(mainItems.get(i),mainItems.get(j)));
mys.add(mainItems.get(j));
}
}

if (isAllEqual(myd) == false) {

sortedTransactions.add(mainItems.get(i));
sortedTransactions.add(mys.get(maxValue(myd)));
}
}
}
}

static private int maxValue(List<Double> d) {

double max = d.get(0);
int f = 0;

for(int i =1; i< d.size(); i++){

if(d.get(i) > max){

max= d.get(i);
f= i;
}
}
return f;
}
}

最佳答案

您不必创建并集(union(s1, s2).size() 是 s1.size() + s2.size() - 交集(s1, s2).size())。

static private double jaccardSimilarity(int[] a, int[] b) {

Set<Integer> s1 = new HashSet<Integer>();
for (int i = 0; i < a.length; i++) {
s1.add(a[i]);
}
Set<Integer> s2 = new HashSet<Integer>();
for (int i = 0; i < b.length; i++) {
s2.add(b[i]);
}

final int sa = s1.size();
final int sb = s2.size();
s1.retainAll(s2);
final int intersection = s1.size();
return 1d / (sa + sb - intersection) * intersection;
}

关于java - 用Java计算Jaccard相似度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43634867/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com