gpt4 book ai didi

hadoop - 在 hadoop 上安装和配置 elasticsearch

转载 作者:可可西里 更新时间:2023-11-01 14:45:25 25 4
gpt4 key购买 nike

我已经通读了这个页面和其他相关链接以在 hadoop 上安装和配置 elasticsearch。 Install and Configure elasticsearch on hadoop?

但是,我仍然没有一些问题。

我分别使用 elasticsearch 和 spark/hadoop;具体来说,我正在使用 cloudera hadoop,elasticsearch 在其他 linux 机器上运行。在 hadoop 中,我有一个边缘节点来提交一个 spark 作业,该作业通过带有配置的执行程序在其他 6 个节点上运行。

  1. 下面的安装指南对我来说没有太多信息。

(1) 当我在每个节点上安装 elasticsearch-hadoop 二进制文件时,Elasticsearch 会在 HDFS 上保留索引?(2) 如果是这样,我需要把 jar 二进制文件放在哪里?elasticsearch-spark_2.11-2.2.0.jarelasticsearch-hadoop-2.2.0.jar

实际上,使用 elasticsearch-hadoop-2.2.0.jar,我可以从/向运行在 linux 机器上的 elasticsearch 读取/写入文档。

sc.makeRDD(docs).saveToEs(indexname + "/" + typename, Map( "es.nodes" -> ES_HOSTN_ODE_ADDRESS, "es.port" -> ES_HOST_PORT))

(3) 如何为hdfs设置数据路径?目前 elasticsearch 已经被配置成

path.data: /data1,/data2,/data3,/data4

(4) 还有其他好的文档/页面可以引用吗?

elasticsearch-hadoop binaries can be obtained either by downloading them from the elastic.co site as a ZIP (containing project jars, sources and documentation) or by using any Maven-compatible tool with the following dependency:


<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>2.2.0</version>
</dependency>

jar above contains all the features of elasticsearch-hadoop and does not require any other dependencies at runtime; in other words it can be used as is. elasticsearch-hadoop binary is suitable for both Hadoop 1.x and Hadoop 2.x (also known as YARN) environments without any changes.
  1. 在 spark 上使用 scala,我可以从 elasticsearch 查询文档,但查询的数据并没有为每个执行程序并行化。如果我使用 elasticsearch-hadoop 并从 hdfs 查询数据,则数据会像 parquet 文件一样自动并行化 (RDD),而无需

    sc.parallelize(数据);

  2. 对我来说,推/拉大数据对于使用 Kibana 等多种工具进行可视化很重要。elasticsearch-hadoop 相对于 elasticsearch 有什么强大的优势吗?

最佳答案

来自 Elasticsearch 的 website

Elasticsearch for Apache Hadoop is an open-source, stand-alone, self-contained, small library that allows Hadoop jobs (whether using Map/Reduce or libraries built upon it such as Hive, Pig or Cascading or new upcoming libraries like Apache Spark ) to interact with Elasticsearch. One can think of it as a connector that allows data to flow bi-directionaly so that applications can leverage transparently the Elasticsearch engine capabilities to significantly enrich their capabilities and increase the performance.

根据我对您问题的理解回答:

(1) Elasticsearch keeps indices on HDFS when I install elasticsearch-hadoop binary on each node?

不,Elasticsearch-Hadoop 是一个库,Hadoop 作业可以通过它向 Elasticsearch 加载或存储数据。

(2) If so, where do I need to put jar binary? elasticsearch-spark_2.11-2.2.0.jar elasticsearch-hadoop-2.2.0.jar

这些库必须存在于 Spark 的类路径中:Add exteranl jars to classpath或 Hadoop:Add exteranl jars to classpath工作。

(3) how do I need to set data path for hdfs?

我认为在 Hadoop/Spark 作业中访问 Elasticsearch 数据不需要此步骤。

(4) are there some other good documents/pages to refer ?

我引用了 Elasticsearch Apache Spark : native support 为了我的目的。

  1. Using scala on spark, I can query documents from elasticsearch, but the queried data is not parallelized for each executor. If I use elasticsearch-hadoop and query data from hdfs, the data is automatically parallelized (RDD) like parquet files without having to sc.parallelize(data);

是的,你是对的。在 Elasticsearch Java/Scala 客户端上使用 Elasticsearch-Hadoop 和 Elasticsearch-spark 的优势与使用 Hadoop 或 Spark 的固有优势相同,即在集群上分配处理负载。

  1. For me, pushing/pulling big data are important with visualization using several tools such as Kibana. Are there any strong advantages elasticsearch-hadoop against elasticsearch?

如前所述,“elasticsearch-hadoop”只是一个库。

关于hadoop - 在 hadoop 上安装和配置 elasticsearch,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35901289/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com