prometheus - 无法启动 Prometheus 服务器-6ren

prometheus - 无法启动 Prometheus 服务器

转载作者：行者123 更新时间：2023-12-05 08:20:12

我在 Amazon linux 2 实例上安装了 prometheus，这是我在用户数据中使用的配置:

cat << EOF > /etc/systemd/system/prometheus.service 
[Unit] 
Description=Prometheus Server 
Documentation=https://prometheus.io/docs/introduction/overview/ 
Wants=network-online.target
After=network-online.target

[Service] 
User=prometheus 
Restart=on-failure 

#Change this line if you download the  
#Prometheus on different path user 
ExecStart=/home/prometheus/prometheus/prometheus --config.file=/home/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/app/prometheus/data

[Install] 
WantedBy=multi-user.target 
EOF

cat << EOF > /home/prometheus/prometheus/prometheus.yml 
# my global config 
global: 
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. 
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. 
  # scrape_timeout is set to the global default (10s). 

# Alertmanager configuration 
alerting: 
  alertmanagers: 
  - static_configs: 
    - targets: 
      # - alertmanager:9093 

# Load rules once and periodically evaluate them according to the global evaluation_interval. 
rule_files: 
  # - "first_rules.yml" 
  # - "second_rules.yml" 

# A scrape configuration containing exactly one endpoint to scrape: 
# Here it's Prometheus itself. 
scrape_configs: 
  # The job name is added as a label job=<job_name> to any timeseries scraped from this config. 
  - job_name: 'prometheus' 

    # metrics_path defaults to '/metrics' 
    # scheme defaults to 'http'. 

    static_configs: 
    - targets: ['localhost:9090'] 
  - job_name: 'node_prometheus' 

    # metrics_path defaults to '/metrics' 
    # scheme defaults to 'http'. 

    static_configs: 
    - targets: ['localhost:9100'] 
  - job_name: 'grafana' 

    # metrics_path defaults to '/metrics' 
    # scheme defaults to 'http'. 

    static_configs: 
# mettre ALB grafana 
    - targets: ['${grafana_dns}'] 

  - job_name: 'sqs_exporter' 
    scrape_interval: 30s 
    scrape_timeout: 30s 
    static_configs: 
    - targets: ['localhost:9434'] 

  - job_name: 'cloudwatch_exporter' 
    scrape_interval: 5m 
    scrape_timeout: 60s 
    static_configs: 
    - targets: ['localhost:9106'] 

  - job_name: '_metrics' 
    metric_relabel_configs: 
    relabel_configs: 
     - source_labels: 
       - __meta_ec2_platform 
       action: keep 
       regex: .*windows.* 
     - action: labelmap 
       regex: __meta_ec2_tag_(.*) 
       replacement: \$1 
    ec2_sd_configs: 
      - region: eu-west-1 
        port: 9543 

  - job_name: 'cadvisor' 
    static_configs: 
    - targets: ['localhost:8080'] 

  - job_name: 'elasticbeanstalk_exporter' 
    static_configs: 
    - targets: ['localhost:9552'] 

EOF



systemctl daemon-reload 
systemctl enable prometheus
systemctl start prometheus

当我检查 prometheus 是否正在运行时，我得到了这个:

[ec2-user@ip-10-193-192-49 ~]$  sudo systemctl status prometheus
● prometheus.service - Prometheus Server
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2019-12-02 11:12:33 UTC; 4s ago
     Docs: https://prometheus.io/docs/introduction/overview/
  Process: 22507 ExecStart=/home/prometheus/prometheus/prometheus --config.file=/home/prometheus/prometheus/prometheus.yml --storage.tsdb.path=/app/prometheus/data (code=exited, status=2)
 Main PID: 22507 (code=exited, status=2)

Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service failed.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service holdoff time over, scheduling restart.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: start request repeated too quickly for prometheus.service
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Failed to start Prometheus Server.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 11:12:33 ip-10-193-192-49.service.app systemd[1]: prometheus.service failed.
[ec2-user@ip-10-193-192-49 ~]$

我安装了 prometheus 版本 2.14.0。有什么帮助吗？

我在文件 /etc/systemd/system/prometheus.service 中注释了行 Restart=on-failure 然后:

systemctl daemon-reload 
systemctl status prometheus

我得到了这个:

Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: start request repeated too quickly for prometheus.service
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: Failed to start Prometheus Server.
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 12:57:52 ip-10-193-192-58.service.app systemd[1]: prometheus.service failed.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Started Prometheus Server.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Starting Prometheus Server...
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.686Z caller=main.go:296 msg="no time or size retention was set so
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:332 msg="Starting Prometheus" version="(versio
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:333 build_context="(go=go1.13.4, user=root@df2
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:334 host_details="(Linux 4.14.77-81.59.amzn2.x
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:335 fd_limits="(soft=1024, hard=4096)"
Dec 02 12:58:03 ip-10-193-192-58.service.app lor prometheus[23391]: level=info ts=2019-12-02T12:58:03.687Z caller=main.go:336 vm_limits="(soft=unlimited, hard=unlimited
Dec 02 12:58:03 ip-10-193-192-58.service.app prometheus[23391]: level=error ts=2019-12-02T12:58:03.692Z caller=query_logger.go:85 component=activeQueryTracker msg="
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: prometheus.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: Unit prometheus.service entered failed state.
Dec 02 12:58:03 ip-10-193-192-58.service.app systemd[1]: prometheus.service failed.

最佳答案

我有同样的问题，问题是/data/prometheus 的权限应该设置为 prometheus 用户和组。

所以解决方案是:sudo chown -R prometheus:prometheus/data/prometheus/

实际上在你的情况下路径是/app/prometheus/data

关于prometheus - 无法启动 Prometheus 服务器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59137829/

文章推荐： java - 迭代时缺少对象的方法

文章推荐： SwiftUI inputAccessoryView 实现

文章推荐： python - 类型注释稍后定义的类(前向引用)

文章推荐： angular - 导入 Observable rxjs/Observable 不起作用

prometheus - 当 Prometheus 关闭时，Prometheus 客户端库的行为如何？
我是普罗米修斯的新手。根据我到目前为止所阅读和尝试的内容，Prometheus 客户端库通过 HTTP 公开收集的指标，Prometheus 定期读取(抓取)。在 Prometheus 抓取指标之前
prometheus - 具有单个 Prometheus-Operator 的多个 Prometheus 实例
我们正在寻求实现监控和警报解决方案，我们希望为每个功能单元提供自己的 prometheus 实例。目前我们通过 prometheus-operator 使用单个 prometheus 实例运行它，但
prometheus - Prometheus 中的标签是什么？
在 Prometheus 中，有标签柯里化(Currying)。一些示例方法类似于 CurryWith()。这是什么意思？抱歉，我没有找到这方面的任何文档。问题可能与英语不是我的母语有关，我从函数
prometheus - 平均内存使用查询 - Prometheus
如何编写一个查询来输出过去 24 小时内实例的平均内存使用情况？以下查询显示当前内存使用情况 100 * (1 - ((node_memory_MemFree + node_memory_Cache
prometheus - Prometheus 中某些标签中的数据缺失时发出警报
我正在向 prometheus 发送与两个磁盘相关的数据。我想提醒一个磁盘的指标是否停止发送指标。假设我有 diskA 和 diskB，我正在收集 disk_up 指标。现在diskB失败了。在普罗米
prometheus - Prometheus 中的高基数标签有多危险？
我正在考虑将一些指标导出到 Prometheus，但我对我计划做的事情感到紧张。我的系统由一个工作流引擎组成，我想跟踪工作流中每个步骤的一些指标。这似乎是合理的，有一个名为 wfengine_ste
prometheus - prometheus 值中的环境变量
我想根据 prometheus 值文件中的环境 qa/prod 设置环境特定的值 ## Additional alertmanager container environment variable
prometheus - Prometheus - 查询以获得事件增加的百分比
我有一个包含路径和状态代码的请求直方图...如果过去一小时内的错误比前一小时增加了 20%，我如何发出警报？一个指标示例: {instance="someIp",instance_hostname=
prometheus - Prometheus - 查询以获得事件增加的百分比
我有一个包含路径和状态代码的请求直方图...如果过去一小时内的错误比前一小时增加了 20%，我如何发出警报？一个指标示例: {instance="someIp",instance_hostname=
prometheus - 从原始 Prometheus 规则文件创建一个 prometheus-operator `PrometheusRule` (CRD)？
像这样的 Prometheus 规则文件: groups: - name: ./example.rules rules: - alert: ExampleAlert expr: vec
prometheus - 如何使用联合从多个 Prometheus 实例收集 Prometheus 指标(每个实例使用实例 ="localhost:9090")
我们有多个在数据中心运行的 Prometheus 实例(我将它们称为 DC Prometheus 实例)，以及一个额外的 Prometheus 实例(在下面的文本中我们将其称为“主”)，我们在其中从
kubernetes - 将 prometheus 图表从 prometheus-operator 更新为 kube-prometheus-stack
最近 prometheus-operator图表已弃用，图表已重命名 kube-prometheus-stack更清楚地反射(reflect)它安装了 kube-prometheus 项目堆栈，其中
prometheus - 无法启动 Prometheus 服务器
我在 Amazon linux 2 实例上安装了 prometheus，这是我在用户数据中使用的配置: cat /etc/systemd/system/prometheus.service [Uni
prometheus - 计算 Prometheus 卷大小以存储指标
我们正在使用 prometheus 运算符，我们现在想将数据存储在磁盘上，有一个博客对此进行了解释，但不确定来自查询的数字/大小响应 https://www.robustperception.io/h
prometheus - 是否可以通过 prometheus 获取准确的每分钟请求指标
目标通过 grafana 和 prometheus 跟踪 RPM 和正常运行时间情况我们正在使用 django-prometheus -> To emit metrics fluent-bit
prometheus - 在 Prometheus 中将实例重新标记为主机名
我有 Prometheus 从几台机器上的节点导出器中抓取指标，配置如下: scrape_configs: - job_name: node_exporter static_configs
prometheus - 在从事件更改为非事件之前延迟 Prometheus 警报
我的 Prometheus 设置中有一个警报，它会在 someMetric > 100 时发送警报已对 5m 有效然后每隔 24h 重新发送警报根据下面的配置: prometheus-alert.ym
prometheus - 如何在 Prometheus 中生成平均值
我有两个计数器。一个是测量累加器，另一个是测量计数。如何生成范围向量平均值？我尝试了以下但得到的结果为空。 rate(my_events{type="sum"}[60s]) / rate(my_ev
prometheus - Prometheus metrics_path 中的问号被编码
因为 Prometheus 仅支持文本指标和许多 json 中的工具返回指标(如 Finatra、Spring Boot)，所以我创建了一个简单的代理，将 json 转换为文本。因为我想将它用于多个源
prometheus - 计算 Prometheus 指标具有特定值的持续时间？
Prometheus 是否可以计算指标具有特定值的持续时间(例如以秒为单位)？一个简单的例子是 up可以有两个值的度量:1或 0指示系统是否正在运行。想象一下，自上周以来，系统多次上下波动。我希望

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

prometheus - 无法启动 Prometheus 服务器