gpt4 book ai didi

hadoop - Pig 集成Cassandra : simple distributed query takes a few minutes to complete. 这正常吗?

转载 作者:可可西里 更新时间:2023-11-01 16:34:39 25 4
gpt4 key购买 nike

我设置了 Cassandra + Pig/Hadoop 的测试集成。 8个节点为Cassandra + TaskTracker节点,1个节点为JobTracker/NameNode。

我启动了 cassandra 客户端并在 Cassandra 发行版的 Readme.txt 中创建了一些简单的数据:

  [default@unknown] create keyspace Keyspace1;
[default@unknown] use Keyspace1;
[default@Keyspace1] create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type;
[default@KS1] set Users[jsmith][first] = 'John';
[default@KS1] set Users[jsmith][last] = 'Smith';
[default@KS1] set Users[jsmith][age] = long(42)

然后我运行了 CASSANDRA_HOME 中列出的样本 pig 查询(使用 pig_cassandra):

grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
grunt> cols = FOREACH rows GENERATE flatten(columns);
grunt> colnames = FOREACH cols GENERATE $0;
grunt> namegroups = GROUP colnames BY (chararray) $0;
grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
grunt> orderednames = ORDER namecounts BY $0;
grunt> topnames = LIMIT orderednames 50;
grunt> dump topnames;

大约需要 3 分钟才能完成。

    HadoopVersion   PigVersion      UserId  StartedAt                FinishedAt                            Features
1.0.0 0.9.1 root 2012-01-12 22:16:53 2012-01-12 22:20:22 GROUP_BY,ORDER_BY,LIMIT
Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
job_201201121817_0010 8 1 12 6 9 21 21 21 colnames,cols,namecounts,namegroups,rows GROUP_BY,COMBINER
job_201201121817_0011 1 1 6 6 6 15 15 15 orderednames SAMPLER
job_201201121817_0012 1 1 9 9 9 15 15 15 orderednames ORDER_BY,COMBINER hdfs://xxxx/tmp/temp-744158198/tmp-1598279340,

Input(s):
Successfully read 1 records (3232 bytes) from: "cassandra://Keyspace1/Users"

Output(s):
Successfully stored 3 records (63 bytes) in: "hdfs://xxxx/tmp/temp-744158198/tmp-1598279340"

Counters:
Total records written : 3
Total bytes written : 63
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

日志中没有错误或警告。

这是正常的还是有什么问题?

最佳答案

是的,这是正常的,因为在 Hadoop 上运行 Map/Reduce 作业通常仅启动就需要大约 1 分钟。 Pig 根据脚本的复杂性生成多个 Map/Reduce 作业。

关于hadoop - Pig 集成Cassandra : simple distributed query takes a few minutes to complete. 这正常吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8846788/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com