gpt4 book ai didi

amazon-web-services - AWS Glue 不检测分区并在目录中创建 1000 多个表

转载 作者:行者123 更新时间:2023-12-04 15:07:12 28 4
gpt4 key购买 nike

我正在使用 AWS Glue 创建元数据表。
AWS Glue Crawler 数据存储路径:s3://bucket-name/
S3中的桶结构就像

├── bucket-name        
│   ├── pt=2011-10-11-01
│   │   ├── file1
| | ├── file2
│   ├── pt=2011-10-11-02
│   │   ├── file1
│   ├── pt=2011-10-10-01
│   │   ├── file1
│   ├── pt=2011-10-11-10
│   │   ├── file1


为此 aws 爬虫创建 4 个表。
我的问题是为什么 aws 胶水爬虫没有检测到分区?

最佳答案

要强制 Glue 将多个模式合并在一起,请确保在创建爬虫时选中此选项 -
为每个 S3 路径创建一个架构。
Screenshot of crawler creation step, with this setting enabled
这是一个详细的解释 - 直接引用,来自 AWS 文档 ( reference )

By default, when a crawler defines tables for data stored in Amazon S3, it considers both data compatibility and schema similarity. Data compatibility factors taken into account include whether the data is of the same format (for example, JSON), the same compression type (for example, GZIP), the structure of the Amazon S3 path, and other data attributes. Schema similarity is a measure of how closely the schemas of separate Amazon S3 objects are similar.

You can configure a crawler to CombineCompatibleSchemas into a common table definition when possible. With this option, the crawler still considers data compatibility, but ignores the similarity of the specific schemas when evaluating Amazon S3 objects in the specified include path.

If you are configuring the crawler on the console, to combine schemas, select the crawler option Create a single schema for each S3 path.

关于amazon-web-services - AWS Glue 不检测分区并在目录中创建 1000 多个表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48166100/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com