gpt4 book ai didi

google-cloud-storage - 谷歌云 : Using gsutil to download data from AWS S3 to GCS

转载 作者:行者123 更新时间:2023-12-03 22:50:17 28 4
gpt4 key购买 nike

我们的一位合作者在 AWS 上提供了一些数据,我试图使用 gsutil 将其放入我们的谷歌云存储桶中(只有部分文件对我们有用,所以我不想使用 GCS 上提供的 GUI )。合作者为我们提供了 AWS 存储桶 ID、aws 访问 key id 和 aws secret 访问 key id。

我查看了 GCE 上的文档并编辑了 ~/.botu 文件,以便合并访问 key 。我重新启动了终端并尝试执行“ls”,但出现以下错误:

gsutil ls s3://cccc-ffff-03210/
AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied

我还需要配置/运行其他东西吗?

谢谢!

编辑:

感谢您的回复!

我安装了 Cloud SDK,并且可以在我的 google 云存储项目上访问和运行所有 gsutil 命令。我的问题是试图访问(例如“ls”命令)与我共享的亚马逊 S3。

  • 我取消了 ~/.boto 文件中的两行注释并放置了访问 key :
    # To add HMAC aws credentials for "s3://" URIs, edit and uncomment the
    # following two lines:
    aws_access_key_id = my_access_key
    aws_secret_access_key = my_secret_access_key


  • “gsutil 版本 -l”的输出:
    | => gsutil version -l

    my_gc_id
    gsutil version: 4.27
    checksum: 5224e55e2df3a2d37eefde57 (OK)
    boto version: 2.47.0
    python version: 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)]
    OS: Darwin 15.4.0
    multiprocessing available: True
    using cloud sdk: True
    pass cloud sdk credentials to gsutil: True
    config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
    gsutil path: /Users/pc/Documents/programs/google-cloud- sdk/platform/gsutil/gsutil
    compiled crcmod: True
    installed via package manager: False
    editable install: False


  • 带有 -DD 选项的输出是:
    => gsutil -DD ls s3://my_amazon_bucket_id

    multiprocessing available: True
    using cloud sdk: True
    pass cloud sdk credentials to gsutil: True
    config path(s): /Users/pc/.boto, /Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto
    gsutil path: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil
    compiled crcmod: True
    installed via package manager: False
    editable install: False
    Command being run: /Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gsutil -o GSUtil:default_project_id=my_gc_id -DD ls s3://my_amazon_bucket_id
    config_file_list: ['/Users/pc/.boto', '/Users/pc/.config/gcloud/legacy_credentials/pc@gmail.com/.boto']
    config: [('debug', '0'), ('working_dir', '/mnt/pyami'), ('https_validate_certificates', 'True'), ('debug', '0'), ('working_dir', '/mnt/pyami'), ('content_language', 'en'), ('default_api_version', '2'), ('default_project_id', 'my_gc_id')]
    DEBUG 1103 08:42:34.664643 provider.py] Using access key found in shared credential file.
    DEBUG 1103 08:42:34.664919 provider.py] Using secret key found in shared credential file.
    DEBUG 1103 08:42:34.665841 connection.py] path=/
    DEBUG 1103 08:42:34.665967 connection.py] auth_path=/my_amazon_bucket_id/
    DEBUG 1103 08:42:34.666115 connection.py] path=/?delimiter=/
    DEBUG 1103 08:42:34.666200 connection.py] auth_path=/my_amazon_bucket_id/?delimiter=/
    DEBUG 1103 08:42:34.666504 connection.py] Method: GET
    DEBUG 1103 08:42:34.666589 connection.py] Path: /?delimiter=/
    DEBUG 1103 08:42:34.666668 connection.py] Data:
    DEBUG 1103 08:42:34.666724 connection.py] Headers: {}
    DEBUG 1103 08:42:34.666776 connection.py] Host: my_amazon_bucket_id.s3.amazonaws.com
    DEBUG 1103 08:42:34.666831 connection.py] Port: 443
    DEBUG 1103 08:42:34.666882 connection.py] Params: {}
    DEBUG 1103 08:42:34.666975 connection.py] establishing HTTPS connection: host=my_amazon_bucket_id.s3.amazonaws.com, kwargs={'port': 443, 'timeout': 70}
    DEBUG 1103 08:42:34.667128 connection.py] Token: None
    DEBUG 1103 08:42:34.667476 auth.py] StringToSign:
    GET


    Fri, 03 Nov 2017 12:42:34 GMT
    /my_amazon_bucket_id/
    DEBUG 1103 08:42:34.667600 auth.py] Signature:
    AWS RN8=
    DEBUG 1103 08:42:34.667705 connection.py] Final headers: {'Date': 'Fri, 03 Nov 2017 12:42:34 GMT', 'Content-Length': '0', 'Authorization': u'AWS AK6GJQ:EFVB8F7rtGN8=', 'User-Agent': 'Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0'}
    DEBUG 1103 08:42:35.179369 https_connection.py] wrapping ssl socket; CA certificate file=/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/third_party/boto/boto/cacerts/cacerts.txt
    DEBUG 1103 08:42:35.247599 https_connection.py] validating server certificate: hostname=my_amazon_bucket_id.s3.amazonaws.com, certificate hosts=['*.s3.amazonaws.com', 's3.amazonaws.com']
    send: u'GET /?delimiter=/ HTTP/1.1\r\nHost: my_amazon_bucket_id.s3.amazonaws.com\r\nAccept-Encoding: identity\r\nDate: Fri, 03 Nov 2017 12:42:34 GMT\r\nContent-Length: 0\r\nAuthorization: AWS AN8=\r\nUser-Agent: Boto/2.47.0 Python/2.7.10 Darwin/15.4.0 gsutil/4.27 (darwin) google-cloud-sdk/164.0.0\r\n\r\n'
    reply: 'HTTP/1.1 403 Forbidden\r\n'
    header: x-amz-bucket-region: us-east-1
    header: x-amz-request-id: 60A164AAB3971508
    header: x-amz-id-2: +iPxKzrW8MiqDkWZ0E=
    header: Content-Type: application/xml
    header: Transfer-Encoding: chunked
    header: Date: Fri, 03 Nov 2017 12:42:34 GMT
    header: Server: AmazonS3
    DEBUG 1103 08:42:35.326652 connection.py] Response headers: [('date', 'Fri, 03 Nov 2017 12:42:34 GMT'), ('x-amz-id-2', '+iPxKz1dPdgDxpnWZ0E='), ('server', 'AmazonS3'), ('transfer-encoding', 'chunked'), ('x-amz-request-id', '60A164AAB3971508'), ('x-amz-bucket-region', 'us-east-1'), ('content-type', 'application/xml')]
    DEBUG 1103 08:42:35.327029 bucket.py] <?xml version="1.0" encoding="UTF-8"?>
    <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>6097164508</RequestId><HostId>+iPxKzrWWZ0E=</HostId></Error>
    DEBUG: Exception stack trace:
    Traceback (most recent call last):
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 577, in _RunNamedCommandAndHandleExceptions
    collect_analytics=True)
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 317, in RunNamedCommand
    return_code = command_inst.RunCommand()
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/commands/ls.py", line 548, in RunCommand
    exp_dirs, exp_objs, exp_bytes = ls_helper.ExpandUrlAndPrint(storage_url)
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 180, in ExpandUrlAndPrint
    print_initial_newline=False)
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/ls_helper.py", line 252, in _RecurseExpandUrlAndPrint
    bucket_listing_fields=self.bucket_listing_fields):
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 476, in IterAll
    expand_top_level_buckets=expand_top_level_buckets):
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/wildcard_iterator.py", line 157, in __iter__
    fields=bucket_listing_fields):
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 413, in ListObjects
    self._TranslateExceptionAndRaise(e, bucket_name=bucket_name)
    File "/Users/pc/Documents/programs/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1471, in _TranslateExceptionAndRaise
    raise translated_exception
    AccessDeniedException: AccessDeniedException: 403 AccessDenied


    AccessDeniedException: 403 AccessDenied
  • 最佳答案

    我假设您能够使用 gcloud init 设置 gcloud 凭据和 gcloud auth login gcloud auth activate-service-account ,并且可以成功地将对象列出/写入 GCS。
    从那里,你需要两件事。正确配置的 AWS IAM 角色应用于您正在使用的 AWS 用户,以及正确配置的 ~/.boto文件。
    用于存储桶访问的 AWS S3 IAM 策略
    必须通过授予用户的角色或附加到用户的内联策略来应用此类策略。

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "s3:GetObject",
    "s3:ListBucket"
    ],
    "Resource": [
    "arn:aws:s3:::some-s3-bucket/*",
    "arn:aws:s3:::some-s3-bucket"
    ]
    }
    ]
    }
    重要的是你有 ListBucketGetObject操作,并且这些操作的资源范围至少包括您希望从中读取的存储桶(或其前缀)。
    .boto 文件配置
    服务提供商之间的互操作总是有点棘手。在撰写本文时,为了支持 AWS Signature V4 (所有 AWS 区域都普遍支持的唯一一个),您必须向您的 ~/.boto 添加几个额外的属性。文件不仅仅是凭证,在 [s3] 中团体。
    [Credentials]
    aws_access_key_id = [YOUR AKID]
    aws_secret_access_key = [YOUR SECRET AK]
    [s3]
    use-sigv4=True
    host=s3.us-east-2.amazonaws.com
    use-sigv4 property通过 gsutil 提示 Boto 将 AWS Signature V4 用于请求。目前,这需要在配置中指定主机,不幸的是。很容易找出主机名,因为它遵循 s3.[BUCKET REGION].amazonaws.com 的模式。 .
    如果您有来自多个 S3 区域的 rsync/cp 工作,您可以通过几种方式处理它。您可以设置环境变量,如 BOTO_CONFIG在运行命令以在多个文件之间进行更改之前。或者,您可以使用顶级参数覆盖每次运行的设置,例如: gsutil -o s3:host=s3.us-east-2.amazonaws.com ls s3://some-s3-bucket编辑:
    只是想补充...另一种很酷的方式来完成这项工作是 rclone .

    关于google-cloud-storage - 谷歌云 : Using gsutil to download data from AWS S3 to GCS,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47025324/

    28 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com