gpt4 book ai didi

Airflow 用户模拟(run_as_user)不起作用

转载 作者:行者123 更新时间:2023-12-05 07:06:58 32 4
gpt4 key购买 nike

我正尝试在 Airflow 中为我们的 DAG 使用 run_as_user 功能,但我们遇到了一些问题。有什么帮助或建议吗?

DAG Code:from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
current_time = datetime.now() - timedelta(days=1)
default_args = {
'start_date': datetime.strptime(current_time.strftime('%Y-%m-%d %H:%M:%S'),'%Y-%m-%d %H:%M:%S'),
'run_as_user': 'airflowaduser',
'execution_timeout': timedelta(minutes=5)
}
dag = DAG('test_run-as_user', default_args=default_args,description='Run hive Query DAG', schedule_interval='0 * * * *',)
hive_ex = BashOperator(
task_id='hive-ex',
bash_command='whoami',
dag=dag
)

我已将 airflow 添加到 sudoers,它可以从 Linux shell 切换到无需密码的 airflowaduser。

Airflow ALL=(ALL) NOPASSWD: ALL

运行 DAG 时的错误详细信息如下:

*** Reading local file: /home/airflow/logs/test_run-as_user/hive-ex/2020-06-09T16:00:00+00:00/1.log
[2020-06-09 17:00:04,602] {taskinstance.py:620} INFO - Dependencies all met for <TaskInstance: test_run-as_user.hive-ex 2020-06-09T16:00:00+00:00 [queued]>
[2020-06-09 17:00:04,613] {taskinstance.py:620} INFO - Dependencies all met for <TaskInstance: test_run-as_user.hive-ex 2020-06-09T16:00:00+00:00 [queued]>
[2020-06-09 17:00:04,613] {taskinstance.py:838} INFO -
--------------------------------------------------------------------------------
[2020-06-09 17:00:04,613] {taskinstance.py:839} INFO - Starting attempt 1 of 1
[2020-06-09 17:00:04,613] {taskinstance.py:840} INFO -
--------------------------------------------------------------------------------
[2020-06-09 17:00:04,651] {taskinstance.py:859} INFO - Executing <Task(BashOperator): hive-ex> on 2020-06-09T16:00:00+00:00
[2020-06-09 17:00:04,651] {base_task_runner.py:133} INFO - Running: ['sudo', '-E', '-H', '-u', 'airflowaduser', 'airflow', 'run', 'test_run-as_user', 'hive-ex', '2020-06-09T16:00:00+00:00', '--job_id', '2314', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/test_run-as_user/testscript.py', '--cfg_path', '/tmp/tmpbinlgw54']
[2020-06-09 17:00:04,664] {base_task_runner.py:115} INFO - Job 2314: Subtask hive-ex sudo: airflow: command not found
[2020-06-09 17:00:09,576] {logging_mixin.py:95} INFO - [[34m2020-06-09 17:00:09,575[0m] {[34mlocal_task_job.py:[0m105} INFO[0m - Task exited with return code 1[0m

我们的 Airflow 在虚拟环境中运行。

最佳答案

在虚拟环境中运行 airflow 时,只有用户“airflow”被配置为运行 airflow 命令。如果你想以其他用户身份运行,你需要将主目录设置为与 airflow 用户相同 (/home/airflow) 并使其属于 0组。请参阅 [https://airflow.apache.org/docs/docker-stack/entrypoint.html#allowing-arbitrary-user-to-run-the-container]

此外,run_as_user 功能调用 sudo,它只允许使用安全路径。 airflow 命令的位置不是安全路径的一部分,但可以将其添加到 sudoers 文件中。您可以使用 whereis airflow 检查 Airflow 目录在哪里,在我的容器中它是 /home/airflow/.local/bin

为了解决这个问题,我需要在我的 Dockerfile 中添加 4 行:

RUN useradd -u [airflowaduser UID] -g 0 -d /home/airflow kettle && \
# create airflowaduser
usermod -u [airflow UID] -aG sudo airflow && \
# add airflow to sudo group
echo "airflow ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \
# allow airflow to run sudo without a password
sed -i 's#/.venv/bin#/home/airflow/.local/bin:/.venv/bin#' /etc/sudoers
# update secure path to include the airflow directory

关于 Airflow 用户模拟(run_as_user)不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62303665/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com