gpt4 book ai didi

docker - 如何对失败的Docker任务进行故障排除

转载 作者:行者123 更新时间:2023-12-02 20:02:14 25 4
gpt4 key购买 nike

我只是在docker世界中起步,关于一切组织方式的许多(基本)原则仍不清楚。请帮助我了解如何处理失败的docker任务。

我的docker服务无法正常工作,但这是第二个问题。主要问题在于,如何解决该问题尚不清楚。

This is my docker-compose file,由应用程序服务和mongodb服务组成。应用程序服务将日志写入/opt/myapp/log/app.log。完整的源代码可以在here中找到。我还建了一个corresponding docker image and uploaded it to the dockerhub

让我们开始堆栈:

docker swarm init
Swarm initialized: current node (xpkngdn0vpr73nioalzbkem1k) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-6109nv6pn7eb9gtam8bq4m198k5sk7ztzf7hy7yfv5c47kcrmq-9fbrmmccd977kx22mivs7segn 192.168.65.3:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

docker stack deploy -c docker-compose.yml myapp
Creating network myapp_default
Creating service myapp_web
Creating service myapp_db

之后,让我们等待一会儿(〜1分钟),然后继续:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
31e9f3a8f5aa deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" 31 seconds ago Up 25 seconds 8090/tcp myapp_web.1.sij3z7cbbynsxos6608ru2f8a
9fc6a5868c12 mongo:latest "docker-entrypoint.s…" About a minute ago Up About a minute 27017/tcp myapp_db.1.gxl5xwj1tg80nr16clbskk2oc
a3ff2ba0c8c5 deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" About a minute ago Exited (137) 32 seconds ago myapp_web.1.3dv8x2dx6kig4qkf1wc2axro8

我们看到有一个失败的任务。让我们尝试了解出了什么问题:
docker commit a3ff2ba0c8c5 snapshot
sha256:bec4756cadebbada400b4d1037cac671168396bf73b7d3e875c6f98f63522afd

docker run --rm -it snapshot /bin/sh
/opt/myapp # cat /opt/myapp/log/app.log
2018-10-05 15:34:20 - Starting Start on a3ff2ba0c8c5 with PID 1 (/opt/myapp/lib/myapp.jar started by root in /opt/myapp)
2018-10-05 15:34:21 - No active profile set, falling back to default profiles: default

任务的日志不包含足以解决问题的目标信息。但是,当我们独立运行应用程序镜像时,该数据可用:
docker run --rm -d deniszhdanov/docker-swarm-troobleshoot-service:1
825a818b425feb7ed1f593c14a411efb68457aee9c6bfcf27f745fd58cfa0001

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
825a818b425f deniszhdanov/docker-swarm-troobleshoot-service:1 "java -jar /opt/myap…" 46 seconds ago Up 44 seconds 8090/tcp zen_bardeen

docker exec -it 825a818b425f /bin/sh
/opt/myapp # cat /opt/myapp/log/app.log
2018-10-05 15:30:09 - Starting Start on 825a818b425f with PID 1 (/opt/myapp/lib/myapp.jar started by root in /opt/myapp)
2018-10-05 15:30:09 - No active profile set, falling back to default profiles: default
2018-10-05 15:30:09 - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@18ef96: startup date <Fri Oct 05 15:30:09 GMT 2018>; root of context hierarchy
2018-10-05 15:30:10 - Cluster created with settings {hosts=<db:27017>, mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2018-10-05 15:30:10 - Adding discovered server db:27017 to client view of cluster
2018-10-05 15:30:11 - Exception in monitor thread while connecting to server db:27017
com.mongodb.MongoSocketException: db: Name does not resolve
at com.mongodb.ServerAddress.getSocketAddress(ServerAddress.java:188)
at com.mongodb.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:59)
at com.mongodb.connection.SocketStream.open(SocketStream.java:57)
at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:126)
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:114)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: db: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at com.mongodb.ServerAddress.getSocketAddress(ServerAddress.java:186)
... 5 common frames omitted

呵呵,花了一段时间描述了所有这些,这要归功于每个设法做到这一点的人:)

问题:
  • 为什么独立容器和任务的状态不同?
  • 建议使用什么方法来排除失败的任务

  • 问候,丹尼斯

    最佳答案

    最终,我找到了这个docker troubleshooting page,检查了docker日志并发现了这一点:

    2018-10-06 11:58:29.662461+0800  localhost com.docker.hyperkit[583]: [91168.810550] CPU: 3 PID: 50578 Comm: java Not tainted 4.9.93-linuxkit-aufs #1
    2018-10-06 11:58:29.663013+0800 localhost com.docker.hyperkit[583]: [91168.811356] Hardware name: BHYVE, BIOS 1.00 03/14/2014
    2018-10-06 11:58:29.663984+0800 localhost com.docker.hyperkit[583]: [91168.811909] 0000000000000000 ffffffffa243922a ffffbb9dc0763de8 ffff937eb67aed00
    2018-10-06 11:58:29.664792+0800 localhost com.docker.hyperkit[583]: [91168.812878] ffffffffa21f5d85 0000000000000000 0000000000000000 ffffbb9dc0763de8
    2018-10-06 11:58:29.665681+0800 localhost com.docker.hyperkit[583]: [91168.813694] ffff937e6df576a0 0000000000000202 ffffffffa27f9dae ffff937eb67aed00
    2018-10-06 11:58:29.666082+0800 localhost com.docker.hyperkit[583]: [91168.814585] Call Trace:
    2018-10-06 11:58:29.666638+0800 localhost com.docker.hyperkit[583]: [91168.814984] [<ffffffffa243922a>] ? dump_stack+0x5a/0x6f
    2018-10-06 11:58:29.667238+0800 localhost com.docker.hyperkit[583]: [91168.815534] [<ffffffffa21f5d85>] ? dump_header+0x78/0x1ed
    2018-10-06 11:58:29.667980+0800 localhost com.docker.hyperkit[583]: [91168.816144] [<ffffffffa27f9dae>] ? _raw_spin_unlock_irqrestore+0x16/0x18
    2018-10-06 11:58:29.668686+0800 localhost com.docker.hyperkit[583]: [91168.816878] [<ffffffffa21a1f90>] ? oom_kill_process+0x83/0x324
    2018-10-06 11:58:29.669273+0800 localhost com.docker.hyperkit[583]: [91168.817583] [<ffffffffa21a25b7>] ? out_of_memory+0x239/0x267
    2018-10-06 11:58:29.669944+0800 localhost com.docker.hyperkit[583]: [91168.818162] [<ffffffffa21ef2cd>] ? mem_cgroup_out_of_memory+0x4b/0x79
    2018-10-06 11:58:29.670652+0800 localhost com.docker.hyperkit[583]: [91168.818834] [<ffffffffa21f34a6>] ? mem_cgroup_oom_synchronize+0x26b/0x294
    2018-10-06 11:58:29.671338+0800 localhost com.docker.hyperkit[583]: [91168.819560] [<ffffffffa21ef650>] ? mem_cgroup_is_descendant+0x48/0x48
    2018-10-06 11:58:29.671982+0800 localhost com.docker.hyperkit[583]: [91168.820253] [<ffffffffa21a2612>] ? pagefault_out_of_memory+0x2d/0x6f
    2018-10-06 11:58:29.672615+0800 localhost com.docker.hyperkit[583]: [91168.820886] [<ffffffffa20459b0>] ? __do_page_fault+0x3c6/0x45f
    2018-10-06 11:58:29.673158+0800 localhost com.docker.hyperkit[583]: [91168.821516] [<ffffffffa27fb3c8>] ? page_fault+0x28/0x30
    2018-10-06 11:58:29.677517+0800 localhost com.docker.hyperkit[583]: [91168.822159] Task in /docker/bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235 killed as a result of limit of /docker/bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235
    2018-10-06 11:58:29.678200+0800 localhost com.docker.hyperkit[583]: [91168.826457] memory: usage 51188kB, limit 51200kB, failcnt 8764
    2018-10-06 11:58:29.678913+0800 localhost com.docker.hyperkit[583]: [91168.827160] memory+swap: usage 102400kB, limit 102400kB, failcnt 78
    2018-10-06 11:58:29.679469+0800 localhost com.docker.hyperkit[583]: [91168.827815] kmem: usage 884kB, limit 9007199254740988kB, failcnt 0
    2018-10-06 11:58:29.681923+0800 localhost com.docker.hyperkit[583]: [91168.828391] Memory cgroup stats for /docker/bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235: cache:20KB rss:50284KB rss_huge:0KB mapped_file:4KB dirty:8KB writeback:0KB swap:51212KB inactive_anon:25264KB active_anon:25020KB inactive_file:8KB active_file:8KB unevictable:0KB
    2018-10-06 11:58:29.682630+0800 localhost com.docker.hyperkit[583]: [91168.830820] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
    2018-10-06 11:58:29.683423+0800 localhost com.docker.hyperkit[583]: [91168.831586] [50168] 0 50168 499844 14545 94 5 12964 0 java
    2018-10-06 11:58:29.684186+0800 localhost com.docker.hyperkit[583]: [91168.832329] Memory cgroup out of memory: Kill process 50168 (java) score 1078 or sacrifice child
    2018-10-06 11:58:29.685451+0800 localhost com.docker.hyperkit[583]: [91168.833172] Killed process 50168 (java) total-vm:1999376kB, anon-rss:49108kB, file-rss:9072kB, shmem-rss:0kB
    2018-10-06 11:58:29.970702+0800 localhost com.docker.hyperkit[583]: [91169.119073] oom_reaper: reaped process 50168 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
    2018-10-06 11:58:30.206071+0800 localhost com.docker.driver.amd64-linux[579]: osxfs: die event: de-registering container bf5d7ef29816596e58e25b05e5bde1f57531e02fe31317d6a1dbad477580b235

    即原因是我无意中从docker-compose教程之一中复制了资源/限制/内存设置,并且docker因此不断杀死我的应用程序。

    归根结底,问题是微不足道的,故障排除也不是真正的负担。只需查看docker守护程序日志。仅由于我对Docker的经验不足,我花了一个晚上试图从Docker容器/群(服务日志,容器日志等)中查找根。好吧,实践很完美:)

    关于docker - 如何对失败的Docker任务进行故障排除,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52669938/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com