gpt4 book ai didi

amazon-dynamodb - 如何使用数据管道导出具有按需配置的DynamoDB表

转载 作者:行者123 更新时间:2023-12-04 10:13:56 28 4
gpt4 key购买 nike

我曾经使用名为Export DynamoDB table to S3的数据管道模板将DynamoDB表导出到文件。我最近更新了我所有的DynamoDB表,使其具有按需配置,并且该模板不再起作用。我可以肯定这是因为旧模板指定了要消耗的DynamoDB吞吐量百分比,这与按需表无关。

我尝试将旧模板导出为JSON,删除对吞吐量百分比消耗的引用,并创建一个新管道。但是,这是不成功的。

谁能建议如何将提供吞吐量的旧式管道脚本转换为新的按需表脚本?

这是我原始的运行脚本:

{
"objects": [
{
"name": "DDBSourceTable",
"id": "DDBSourceTable",
"type": "DynamoDBDataNode",
"tableName": "#{myDDBTableName}"
},
{
"name": "EmrClusterForBackup",
"coreInstanceCount": "1",
"coreInstanceType": "m3.xlarge",
"releaseLabel": "emr-5.13.0",
"masterInstanceType": "m3.xlarge",
"id": "EmrClusterForBackup",
"region": "#{myDDBRegion}",
"type": "EmrCluster"
},
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"output": {
"ref": "S3BackupLocation"
},
"input": {
"ref": "DDBSourceTable"
},
"maximumRetries": "2",
"name": "TableBackupActivity",
"step": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}",
"id": "TableBackupActivity",
"runsOn": {
"ref": "EmrClusterForBackup"
},
"type": "EmrActivity",
"resizeClusterBeforeRunning": "true"
},
{
"directoryPath": "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
"name": "S3BackupLocation",
"id": "S3BackupLocation",
"type": "S3DataNode"
}
],
"parameters": [
{
"description": "Output S3 folder",
"id": "myOutputS3Loc",
"type": "AWS::S3::ObjectKey"
},
{
"description": "Source DynamoDB table name",
"id": "myDDBTableName",
"type": "String"
},
{
"default": "0.25",
"watermark": "Enter value between 0.1-1.0",
"description": "DynamoDB read throughput ratio",
"id": "myDDBReadThroughputRatio",
"type": "Double"
},
{
"default": "us-east-1",
"watermark": "us-east-1",
"description": "Region of the DynamoDB table",
"id": "myDDBRegion",
"type": "String"
}
],
"values": {
"myDDBRegion": "us-east-1",
"myDDBTableName": "LIVE_Invoices",
"myDDBReadThroughputRatio": "0.25",
"myOutputS3Loc": "s3://company-live-extracts/"
}
}

这是我尝试的更新失败:
{
"objects": [
{
"name": "DDBSourceTable",
"id": "DDBSourceTable",
"type": "DynamoDBDataNode",
"tableName": "#{myDDBTableName}"
},
{
"name": "EmrClusterForBackup",
"coreInstanceCount": "1",
"coreInstanceType": "m3.xlarge",
"releaseLabel": "emr-5.13.0",
"masterInstanceType": "m3.xlarge",
"id": "EmrClusterForBackup",
"region": "#{myDDBRegion}",
"type": "EmrCluster"
},
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"output": {
"ref": "S3BackupLocation"
},
"input": {
"ref": "DDBSourceTable"
},
"maximumRetries": "2",
"name": "TableBackupActivity",
"step": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName}",
"id": "TableBackupActivity",
"runsOn": {
"ref": "EmrClusterForBackup"
},
"type": "EmrActivity",
"resizeClusterBeforeRunning": "true"
},
{
"directoryPath": "#{myOutputS3Loc}/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}",
"name": "S3BackupLocation",
"id": "S3BackupLocation",
"type": "S3DataNode"
}
],
"parameters": [
{
"description": "Output S3 folder",
"id": "myOutputS3Loc",
"type": "AWS::S3::ObjectKey"
},
{
"description": "Source DynamoDB table name",
"id": "myDDBTableName",
"type": "String"
},
{
"default": "us-east-1",
"watermark": "us-east-1",
"description": "Region of the DynamoDB table",
"id": "myDDBRegion",
"type": "String"
}
],
"values": {
"myDDBRegion": "us-east-1",
"myDDBTableName": "LIVE_Invoices",
"myOutputS3Loc": "s3://company-live-extracts/"
}
}

这是数据管道执行中的错误:
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java

最佳答案

我为此打开了与AWS的支持票。他们的 react 非常全面。我将其粘贴在下面

感谢您就此问题与我们联系。

不幸的是,DynamoDB的数据管道导出/导入作业不支持DynamoDB的新按需模式[1]。

使用按需容量的表没有为读取和写入单位定义的容量。在计算管道的吞吐量时,数据管道依赖于此定义的容量。

例如,如果您有100个RCU(读取容量单位)并且管道吞吐量为0.25(25%),则有效管道吞吐量将为每秒25个读取单位(100 * 0.25)。
但是,在按需容量的情况下,RCU和WCU(写入容量单位)反射(reflect)为0。无论管道吞吐量值如何,计算出的有效吞吐量均为0。

当有效吞吐量小于1时,管道将不执行。

您是否需要将DynamoDB表导出到S3?

如果您仅将这些表导出用于备份,则建议使用DynamoDB的按需备份和还原功能(与按需容量类似的名称,容易混淆)[2]。

请注意,按需备份不会影响表的吞吐量,并且会在几秒钟内完成。您只需支付与备份相关的S3存储成本。
但是,客户无法直接访问这些表备份,只能将其还原到源表。如果您希望对备份数据执行分析或将数据导入其他系统,帐户或表,则此备份方法不适合。

如果您需要使用数据管道来导出DynamoDB数据,那么前进的唯一方法是将表设置为Provisioned Capacity模式。

您可以手动执行此操作,也可以使用AWS CLI命令[3]将其作为事件包含在管道本身中。

例如(按需也称为按请求付费模式):

$ aws dynamodb update-table --table-name myTable --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

--
$ aws dynamodb update-table --table-name myTable --billing-mode PAY_PER_REQUEST

请注意,在禁用按需容量模式后,您需要等待24小时才能再次启用它。

===引用链接===

[1] DynamoDB按需容量(另请参阅有关不支持的服务/工具的注释): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.OnDemand

[2] DynamoDB按需备份和还原: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BackupRestore.html

[3] DynamoDB“更新表”的AWS CLI引用: https://docs.aws.amazon.com/cli/latest/reference/dynamodb/update-table.html

关于amazon-dynamodb - 如何使用数据管道导出具有按需配置的DynamoDB表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54666788/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com