gpt4 book ai didi

google-cloud-platform - 谷歌云存储文件系统,Python 包错误 : AttributeError: 'GCSFile' object has no attribute 'gcsfs'

转载 作者:行者123 更新时间:2023-12-05 03:19:41 27 4
gpt4 key购买 nike

我正在尝试运行一个 python 代码,它将从源 URL 下载数据 block 并将其流式传输到目标云存储 blob。它在独立 pc、本地函数等中运行良好。但是当我尝试使用 GCP Cloud RUN 时,它会抛出奇怪的错误。

AttributeError: 'GCSFile' object has no attribute 'gcsfs'

完整的错误:

Traceback (most recent call last):
File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1683, in __del__
self.close()
File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1661, in close
self.flush(force=True)
File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1527, in flush
self._initiate_upload()
File "/home/<user>/.local/lib/python3.9/site-packages/gcsfs/core.py", line 1443, in _initiate_upload
self.gcsfs.loop,
AttributeError: 'GCSFile' object has no attribute 'gcsfs'

它耗费了我一周的时间,非常感谢任何帮助或指导,在此先感谢。

实际使用过的代码:

from flask import Flask, request
import os
import gcsfs
import requests

app = Flask(__name__)


@app.route('/urltogcs')
def urltogcs():
try:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "secret.json"
gcp_file_system = gcsfs.GCSFileSystem(project='<project_id>')
session = requests.Session()
url = request.args.get('source', 'temp')
blob_path = request.args.get('destination', 'temp')
with session.get(url, stream=True) as r:
r.raise_for_status()
with gcp_file_system.open(blob_path, 'wb') as f_obj:
for chunk in r.iter_content(chunk_size=1024 * 1024):
f_obj.write(chunk)
return f'Successfully downloaded from {url} to {blob_path} :)'
except Exception as e:
print("Failure")
print(e)
return f'download failed for {url} :('


if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))

最佳答案

您的代码(包含建议的更改)适用于我:

main.py:

from flask import Flask, request
import os
import gcsfs
import requests

app = Flask(__name__)

project = os.getenv("PROJECT")
port = os.getenv("PORT", 8080)

@app.route('/urltogcs')
def urltogcs():
try:
gcp_file_system = gcsfs.GCSFileSystem(project=project)
session = requests.Session()
url = request.args.get('source', 'temp')
blob_path = request.args.get('destination', 'temp')
with session.get(url, stream=True) as r:
r.raise_for_status()
with gcp_file_system.open(blob_path, 'wb') as f_obj:
for chunk in r.iter_content(chunk_size=1024 * 1024):
f_obj.write(chunk)
return f'Successfully downloaded from {url} to {blob_path} :)'
except Exception as e:
print("Failure")
print(e)
return f'download failed for {url}


if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port=int(port))

注意:代码需要 project 来自不理想的环境。如果 gcsfs.GCSFileSystem 不需要 project 会更好。或者,可以从 Google 的元数据服务中获取 project。为了方便 (!),我使用环境进行设置。

requirements.txt:

Flask==2.2.2
gcsfs==2022.7.1
gunicorn==20.1.0

Dockerfile:

FROM python:3.10-slim

ENV PYTHONUNBUFFERED True

ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

RUN pip install --no-cache-dir -r requirements.txt

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

重击脚本:

BILLING="[YOUR-BILLING]"
PROJECT="[YOUR-PROJECT]"
REGION="[YOUR-REGION]"
BUCKET="[YOUR-BUCKET]"

# Create Project
gcloud projects create ${PROJECT}

# Associate with Billing Account
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}

# Enabled services
SERVICES=(
"artifactregistry"
"cloudbuild"
"run"
)
for SERVICE in ${SERVICES[@]}
do
gcloud services enable ${SERVICE}.googleapis.com \
--project=${PROJECT}
done

# Create Bucket
gsutil mb -p ${PROJECT} gs://${BUCKET}

# Service Account
ACCOUNT=tester
EMAIL=${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com

# Create Service Account
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}

# Create Service Account key
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}

# Ensure Service Account can write to storage
gcloud projects add-iam-policy-binding ${PROJECT} \
--role=roles/storage.admin \
--member=serviceAccount:${EMAIL}

# Only needed for local testing
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json

# Deploy Cloud Run service
# Run service as Service Account
NAME="urltogcs"
gcloud run deploy ${NAME} \
--source=${PWD} \
--set-env-vars=PROJECT=${PROJECT} \
--no-allow-unauthenticated \
--service-account=${EMAIL} \
--region=${REGION} \
--project=${PROJECT}

# Grab the Cloud Run service's endpoint
ENDPOINT=$(gcloud run services describe ${NAME} \
--region=${REGION} \
--project=${PROJECT} \
--format="value(status.url)")

# Cloud Run service requires auth
TOKEN=$(gcloud auth print-identity-token)

# This page
SRC="https://stackoverflow.com/questions/73393808/"

# Generate a GCS Object name by epoch
DST="gs://${BUCKET}/$(date +%s)"

curl \
--silent \
--get \
--header "Authorization: Bearer ${TOKEN}" \
--data-urlencode "source=${SRC}" \
--data-urlencode "destination=${DST}" \
--write-out '%{response_code}' \
--output /dev/null \
${ENDPOINT}/urltogcs

产量正常:

200

和:

gsutil ls gs://${BUCKET}

gs://${BUCKET}/1660780270

关于google-cloud-platform - 谷歌云存储文件系统,Python 包错误 : AttributeError: 'GCSFile' object has no attribute 'gcsfs' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73393808/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com