gpt4 book ai didi

mysql - 将 MySQL 表转换为 Cassandra 中的 ColumnFamily : Slow batch mutations with Hector

转载 作者:行者123 更新时间:2023-11-29 07:03:56 26 4
gpt4 key购买 nike

我有一个非常大的 MySQL 表(十亿行,有几十列)我想转换成 Cassandra 中的 ColumnFamily。我正在使用 Hector。

我首先创建我的架构:

    String clusterName = "Test Cluster";
String host = "cassandra.lanhost.com:9160";
String newKeyspaceName = "KeyspaceName";
String newColumnFamilyName = "CFName";

ThriftCluster cassandraCluster;
CassandraHostConfigurator cassandraHostConfigurator;

cassandraHostConfigurator = new CassandraHostConfigurator(host);
cassandraCluster = new ThriftCluster(clusterName, cassandraHostConfigurator);

BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition();
columnFamilyDefinition.setKeyspaceName(newKeyspaceName);
columnFamilyDefinition.setName(newColumnFamilyName);
columnFamilyDefinition.setDefaultValidationClass("UTF8Type");
columnFamilyDefinition.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());
columnFamilyDefinition.setComparatorType(ComparatorType.UTF8TYPE);

BasicColumnDefinition columnDefinition = new BasicColumnDefinition();
columnDefinition.setName(StringSerializer.get().toByteBuffer("id"));
columnDefinition.setIndexType(ColumnIndexType.KEYS);
columnDefinition.setValidationClass(ComparatorType.INTEGERTYPE.getClassName());
columnDefinition.setIndexName("id_index");
columnFamilyDefinition.addColumnDefinition(columnDefinition);

columnDefinition = new BasicColumnDefinition();
columnDefinition.setName(StringSerializer.get().toByteBuffer("status"));
columnDefinition.setIndexType(ColumnIndexType.KEYS);
columnDefinition.setValidationClass(ComparatorType.ASCIITYPE.getClassName());
columnDefinition.setIndexName("status_index");
columnFamilyDefinition.addColumnDefinition(columnDefinition);

.......

ColumnFamilyDefinition cfDef = new ThriftCfDef(columnFamilyDefinition);

KeyspaceDefinition keyspaceDefinition =
HFactory.createKeyspaceDefinition(newKeyspaceName, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDef));

cassandraCluster.addKeyspace(keyspaceDefinition);

完成后,我加载存储在列表中的数据,因为我正在使用 namedParametersJdbcTemplate 获取 MySQL 数据,如下所示:

String clusterName = "Test Cluster";
String host = "cassandra.lanhost.com:9160";
String KeyspaceName = "KeyspaceName";
String ColumnFamilyName = "CFName";
final StringSerializer serializer = StringSerializer.get();

public void insert(List<SqlParameterSource> dataToInsert) throws ExceptionParserInterrupted {

Keyspace workingKeyspace = null;
Cluster cassandraCluster = HFactory.getOrCreateCluster(clusterName, host);
workingKeyspace = HFactory.createKeyspace(KeyspaceName, cassandraCluster);
Mutator<String> mutator = HFactory.createMutator(workingKeyspace, serializer);

ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(workingKeyspace, ColumnFamilyName, serializer, serializer);

long t1 = System.currentTimeMillis();

for (SqlParameterSource data : dataToInsert) {

String keyId = "id" + (Integer) data.getValue("id");

mutator.addInsertion(keyId, ColumnFamilyName, HFactory.createColumn("id", (Integer) data.getValue("id"), StringSerializer.get(), IntegerSerializer.get()));
mutator.addInsertion(keyId,ColumnFamilyName, HFactory.createStringColumn("status", data.getValue("status").toString()));

...............

}

mutator.execute();

System.out.println(t1 - System.currentTimeMillis());

我在大约 1 小时内插入了 100 000 行,这真的很慢。我听说过多线程插入,但在这种特殊情况下我不知道该怎么做。我应该使用 BatchMutate 吗?

最佳答案

是的,您应该从多个线程运行您的插入代码。查看以下压力测试代码,了解如何使用 hector 高效地执行此操作的示例: https://github.com/zznate/cassandra-stress

插入性能问题的另一个来源可能是您在列族上应用的二级索引的数量(每个二级索引都会“在幕后”创建一个额外的列族)。

正确设计的数据模型应该不需要大量的二级索引。以下文章很好地概述了 Cassandra 中的数据建模: http://www.datastax.com/docs/1.0/ddl/index

关于mysql - 将 MySQL 表转换为 Cassandra 中的 ColumnFamily : Slow batch mutations with Hector,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8519166/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com