gpt4 book ai didi

java - 使用 Python 或 Java 将数据从本地上传到 Azure ADLS Gen2

转载 作者:行者123 更新时间:2023-12-02 01:23:32 25 4
gpt4 key购买 nike

我有一个 Data Lake Gen2 的 Azure 存储帐户。我想使用 Python(或 Java)将数据从本地上传到 Lake Gen2 文件系统。

我找到了examples关于如何与存储帐户中的文件共享进行交互,但我尚无法找到如何上传到 Lake(而不是文件共享)。我还找到了如何为 Gen1 Lakes here 做到这一点,但除了关闭之外什么也没有 requests对于第二代。

我的问题是,从今天开始,这对于 Python 是否可行?或者,如何使用 Java 将文件上传到 Gen2 Lake?如果能提供演示上传 API 调用的代码片段,我们将不胜感激。

最佳答案

按照官方教程Quickstart: Upload, download, and list blobs with Python如下,如果您尚未注册 multi-protocol access on Data Lake Storage 公共(public)预览版,则无法直接使用 Azure Storage SDK for Python 在 Azure Data Lake Store Gen 2 中进行任何操作.

Note

The features described in this article are available to accounts that have a hierarchical namespace only if you enroll in the public preview of multi-protocol access on Data Lake Storage. To review limitations, see the known issues article.

所以上传数据到ADLS Gen2的唯一解决方案是使用ADLS Gen2的REST API,请引用其引用Azure Data Lake Store REST API .

这是我使用 Python 将数据上传到 ADLS Gen2 的示例代码,它运行良好。

import requests
import json

def auth(tenant_id, client_id, client_secret):
print('auth')
auth_headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
auth_body = {
"client_id": client_id,
"client_secret": client_secret,
"scope" : "https://storage.azure.com/.default",
"grant_type" : "client_credentials"
}
resp = requests.post(f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token", headers=auth_headers, data=auth_body)
return (resp.status_code, json.loads(resp.text))

def mkfs(account_name, fs_name, access_token):
print('mkfs')
fs_headers = {
"Authorization": f"Bearer {access_token}"
}
resp = requests.put(f"https://{account_name}.dfs.core.windows.net/{fs_name}?resource=filesystem", headers=fs_headers)
return (resp.status_code, resp.text)

def mkdir(account_name, fs_name, dir_name, access_token):
print('mkdir')
dir_headers = {
"Authorization": f"Bearer {access_token}"
}
resp = requests.put(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{dir_name}?resource=directory", headers=dir_headers)
return (resp.status_code, resp.text)

def touch_file(account_name, fs_name, dir_name, file_name, access_token):
print('touch_file')
touch_file_headers = {
"Authorization": f"Bearer {access_token}"
}
resp = requests.put(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{dir_name}/{file_name}?resource=file", headers=touch_file_headers)
return (resp.status_code, resp.text)

def append_file(account_name, fs_name, path, content, position, access_token):
print('append_file')
append_file_headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "text/plain",
"Content-Length": f"{len(content)}"
}
resp = requests.patch(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{path}?action=append&position={position}", headers=append_file_headers, data=content)
return (resp.status_code, resp.text)

def flush_file(account_name, fs_name, path, position, access_token):
print('flush_file')
flush_file_headers = {
"Authorization": f"Bearer {access_token}"
}
resp = requests.patch(f"https://{account_name}.dfs.core.windows.net/{fs_name}/{path}?action=flush&position={position}", headers=flush_file_headers)
return (resp.status_code, resp.text)

def mkfile(account_name, fs_name, dir_name, file_name, local_file_name, access_token):
print('mkfile')
status_code, result = touch_file(account_name, fs_name, dir_name, file_name, access_token)
if status_code == 201:
with open(local_file_name, 'rb') as local_file:
path = f"{dir_name}/{file_name}"
content = local_file.read()
position = 0
append_file(account_name, fs_name, path, content, position, access_token)
position = len(content)
flush_file(account_name, fs_name, path, position, access_token)
else:
print(result)


if __name__ == '__main__':
tenant_id = '<your tenant id>'
client_id = '<your client id>'
client_secret = '<your client secret>'

account_name = '<your adls account name>'
fs_name = '<your filesystem name>'
dir_name = '<your directory name>'
file_name = '<your file name>'
local_file_name = '<your local file name>'

# Acquire an Access token
auth_status_code, auth_result = auth(tenant_id, client_id, client_secret)
access_token = auth_status_code == 200 and auth_result['access_token'] or ''
print(access_token)

# Create a filesystem
mkfs_status_code, mkfs_result = mkfs(account_name, fs_name, access_token)
print(mkfs_status_code, mkfs_result)

# Create a directory
mkdir_status_code, mkdir_result = mkdir(account_name, fs_name, dir_name, access_token)
print(mkdir_status_code, mkdir_result)

# Create a file from local file
mkfile(account_name, fs_name, dir_name, file_name, local_file_name, access_token)

希望有帮助。

关于java - 使用 Python 或 Java 将数据从本地上传到 Azure ADLS Gen2,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57293006/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com