gpt4 book ai didi

Azure Databricks : create audit trail for who ran what query at what moment

转载 作者:行者123 更新时间:2023-12-02 06:41:20 24 4
gpt4 key购买 nike

我们有一项审核要求,旨在深入了解谁在 Azure Databricks 中何时执行了哪些查询。 Azure Databricks/Spark UI/作业选项卡已列出已执行的 Spark 作业,包括完成的查询及其提交时间。但它不包括谁执行了查询。

  1. 是否有一个 API 可以与 Azure Databricks 一起使用来查询 UI 中显示的这些 Spark 作业详细信息? (Databricks REST API似乎没有提供这个,但也许我忽略了一些东西)
  2. 有什么方法可以确定谁创建了 Spark 作业(使用 API)

谢谢,下罗

最佳答案

1。访问 Spark API

a.驱动程序节点(内部)访问 Azure Databricks Spark api:

import requests

driverIp = spark.conf.get('spark.driver.host')
port = spark.conf.get("spark.ui.port")
url = F"http://{driverIp}:{port}/api/v1/applications"
r = requests.get(url, timeout=3.0)
r.status_code, r.text

例如,如果您从公共(public) API 收到此错误消息:PERMISSION_DENIED:不允许此端口上的流量

b.对 Azure Databricks Spark API 的外部访问:

import requests
import json
"""
Program access to Databricks Spark UI.

Works external to Databricks environment or running within.
Requires a Personal Access Token. Treat this like a password, do not store in a notebook. Please refer to the Secrets API.
This Python code requires F string support.

"""

# https://<databricks-host>/driver-proxy-api/o/0/<cluster_id>/<port>/api/v1/applications/<application-id-from-master-spark-ui>/stages/<stage-id>
port = spark.conf.get("spark.ui.port")
clusterId = spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
host = "eastus2.azuredatabricks.net"
workspaceId = "999999999999111" # follows the 'o=' in the databricks URLs or zero
token = "dapideedeadbeefdeadbeefdeadbeef68ee3" # Personal Access token

url = F"https://{host}/driver-proxy-api/o/{workspaceId}/{clusterId}/{port}/api/v1/applications/?status=running"
r = requests.get(url, auth=("token", token))

# print Application list response
print(r.status_code, r.text)

applicationId = r.json()[0]['id'] # assumes only one response

url = F"https://{host}/driver-proxy-api/o/{workspaceId}/{clusterId}/{port}/api/v1/applications/{applicationId}/jobs"
r = requests.get(url, auth=("token", token))

print(r.status_code, r.json())

2。抱歉,不,现在不行。

您可以查看集群日志,但用户身份不存在。

投票并跟踪这个想法:https://ideas.databricks.com/ideas/DBE-I-313如何访问创意门户:https://docs.databricks.com/ideas.html

关于Azure Databricks : create audit trail for who ran what query at what moment,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61910208/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com