gpt4 book ai didi

amazon-web-services - Terraform AWS Athena将Glue目录用作数据库

转载 作者:行者123 更新时间:2023-12-04 15:29:12 25 4
gpt4 key购买 nike

我对应该如何使用terraform将Athena连接到我的Glue Catalog数据库感到困惑。

我用

resource "aws_glue_catalog_database" "catalog_database" {
name = "${var.glue_db_name}"
}

resource "aws_glue_crawler" "datalake_crawler" {
database_name = "${var.glue_db_name}"
name = "${var.crawler_name}"
role = "${aws_iam_role.crawler_iam_role.name}"
description = "${var.crawler_description}"
table_prefix = "${var.table_prefix}"
schedule = "${var.schedule}"

s3_target {
path = "s3://${var.data_bucket_name[0]}"
}
s3_target {
path = "s3://${var.data_bucket_name[1]}"
}
}

创建一个Glue数据库,然后使用搜寻器来爬行s3存储桶(这里只有两个),但是我不知道如何将Athena查询服务链接到Glue DB。 In the terraform documentation for Athena ,似乎没有办法将Athena连接到Glue目录,而只能连接到S3 Bucket。但是,显然, Athena can be integrated with Glue

如何构建Athena数据库以将Glue目录用作其数据源而不是S3存储桶?

最佳答案

当前,我们的基本设置是让Glue爬行一个S3存储桶并在Glue DB中创建/更新表,然后可以在Athena中查询该表,如下所示:

搜寻器角色和角色政策:

  • IAM角色的假定策略只需要Glue作为主体
  • IAM角色策略允许对Glue,S3和日志
  • 进行操作
  • 可以将“胶水”操作和资源缩小为真正需要的
  • S3操作仅限于爬网程序
  • 所需的操作
    resource "aws_iam_role" "glue_crawler_role" {
    name = "analytics_glue_crawler_role"

    assume_role_policy = <<EOF
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Action": "sts:AssumeRole",
    "Principal": {
    "Service": "glue.amazonaws.com"
    },
    "Effect": "Allow",
    "Sid": ""
    }
    ]
    }
    EOF
    }

    resource "aws_iam_role_policy" "glue_crawler_role_policy" {
    name = "analytics_glue_crawler_role_policy"
    role = "${aws_iam_role.glue_crawler_role.id}"
    policy = <<EOF
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "glue:*",
    ],
    "Resource": [
    "*"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "s3:GetBucketLocation",
    "s3:ListBucket",
    "s3:GetBucketAcl",
    "s3:GetObject",
    "s3:PutObject",
    "s3:DeleteObject"
    ],
    "Resource": [
    "arn:aws:s3:::analytics-product-data",
    "arn:aws:s3:::analytics-product-data/*",
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "logs:CreateLogGroup",
    "logs:CreateLogStream",
    "logs:PutLogEvents"
    ],
    "Resource": [
    "arn:aws:logs:*:*:/aws-glue/*"
    ]
    }
    ]
    }
    EOF
    }

    S3存储桶,胶水数据库和履带:
    resource "aws_s3_bucket" "product_bucket" {
    bucket = "analytics-product-data"
    acl = "private"
    }

    resource "aws_glue_catalog_database" "analytics_db" {
    name = "inventory-analytics-db"
    }

    resource "aws_glue_crawler" "product_crawler" {
    database_name = "${aws_glue_catalog_database.analytics_db.name}"
    name = "analytics-product-crawler"
    role = "${aws_iam_role.glue_crawler_role.arn}"

    schedule = "cron(0 0 * * ? *)"

    configuration = "{\"Version\": 1.0, \"CrawlerOutput\": { \"Partitions\": { \"AddOrUpdateBehavior\": \"InheritFromTable\" }, \"Tables\": {\"AddOrUpdateBehavior\": \"MergeNewColumns\" } } }"

    schema_change_policy {
    delete_behavior = "DELETE_FROM_DATABASE"
    }

    s3_target {
    path = "s3://${aws_s3_bucket.product_bucket.bucket}/products"
    }
    }

    关于amazon-web-services - Terraform AWS Athena将Glue目录用作数据库,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55129035/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com