- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我们有一个 Dynamo DB 表结构,其中包含 Hash 和 Range 作为主键。
Hash = date.random_number
Range = timestamp
如何获取 X 和 Y 时间戳内的项目?由于哈希 key 附加有 random_number,因此必须多次触发查询。是否可以给出多个哈希值和单个 RangeKeyCondition。
就成本和时间而言,什么最有效?
随机数范围为 1 到 10。
最佳答案
如果我理解正确,您有一个包含以下主键定义的表:
Hash Key : date.random_number
Range Key : timestamp
您必须记住的一件事是,无论您使用 GetItem
还是 Query
,您都必须能够计算 Hash Key
在您的应用程序中,以便成功从表中检索一项或多项。
使用随机数作为哈希键
的一部分是有意义的,这样您的记录就可以均匀分布在 DynamoDB 分区上,但是,您必须以应用程序可以做到的方式进行操作当您需要检索记录时仍然计算这些数字。
考虑到这一点,让我们创建满足指定要求所需的查询。您可用于从表中获取多个项目的 native AWS DynamoDB 操作有:
Query, BatchGetItem and Scan
为了使用 BatchGetItem
,您需要事先知道整个主键(哈希键和范围键),但事实并非如此。
Scan
操作实际上会遍历表中的每条记录,我认为这对于您的要求来说是不必要的。
最后,查询
操作允许您将EQ
(相等)运算符应用于哈希键,从表中检索一项或多项
以及当您没有整个Range Key
或想要匹配多个时可以使用的许多其他运算符。
Range Key
条件的运算符选项为:EQ | LE | LT |通用电气| GT | BEGINS_WITH | 开始之间
在我看来,最适合您的要求的是 BETWEEN
运算符,也就是说,让我们看看如何使用所选的 SDK 构建查询:
Table table = dynamoDB.getTable(tableName);
String hashKey = "<YOUR_COMPUTED_HASH_KEY>";
String timestampX = "<YOUR_TIMESTAMP_X_VALUE>";
String timestampY = "<YOUR_TIMESTAMP_Y_VALUE>";
RangeKeyCondition rangeKeyCondition = new RangeKeyCondition("RangeKeyAttributeName").between(timestampX, timestampY);
ItemCollection<QueryOutcome> items = table.query("HashKeyAttributeName", hashKey,
rangeKeyCondition,
null, //FilterExpression - not used in this example
null, //ProjectionExpression - not used in this example
null, //ExpressionAttributeNames - not used in this example
null); //ExpressionAttributeValues - not used in this example
您可能需要查看以下帖子以获取有关 DynamoDB 主键的更多信息: DynamoDB: When to use what PK type?
问题:我担心的是由于附加了 random_number 而导致多次查询。有没有办法组合这些查询并命中 dynamoDB 一次?
您的担忧是完全可以理解的,但是,通过 BatchGetItem
获取所有记录的唯一方法是了解您想要获取的所有记录的整个主键 (HASH + RANGE)。虽然乍一看,最小化到服务器的 HTTP 往返似乎是最好的解决方案,但文档实际上建议您准确执行您正在做的事情,以避免热分区和预配置吞吐量的不均匀使用:
Design For Uniform Data Access Across Items In Your Tables
"Because you are randomizing the hash key, the writes to the table on each day are spread evenly across all of the hash key values; this will yield better parallelism and higher overall throughput. [...] To read all of the items for a given day, you would still need to Query each of the 2014-07-09.N keys (where N is 1 to 200), and your application would need to merge all of the results. However, you will avoid having a single "hot" hash key taking all of the workload."
来源:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
这里还有一个有趣的点,建议在单个分区中适度使用读取...如果您从哈希键中删除随机数以便能够一次性获取所有记录,那么您很可能会陷入这一困境问题,无论您使用的是 Scan
、Query
还是 BatchGetItem
:
Guidelines for Query and Scan - Avoid Sudden Bursts of Read Activity
"Note that it is not just the burst of capacity units the Scan uses that is a problem. It is also because the scan is likely to consume all of its capacity units from the same partition because the scan requests read items that are next to each other on the partition. This means that the request is hitting the same partition, causing all of its capacity units to be consumed, and throttling other requests to that partition. If the request to read data had been spread across multiple partitions, then the operation would not have throttled a specific partition."
最后,由于您正在处理时间序列数据,因此研究文档建议的一些最佳实践可能会有所帮助:
Understand Access Patterns for Time Series Data
For each table that you create, you specify the throughput requirements. DynamoDB allocates and reserves resources to handle your throughput requirements with sustained low latency. When you design your application and tables, you should consider your application's access pattern to make the most efficient use of your table's resources.
Suppose you design a table to track customer behavior on your site, such as URLs that they click. You might design the table with hash and range type primary key with Customer ID as the hash attribute and date/time as the range attribute. In this application, customer data grows indefinitely over time; however, the applications might show uneven access pattern across all the items in the table where the latest customer data is more relevant and your application might access the latest items more frequently and as time passes these items are less accessed, eventually the older items are rarely accessed. If this is a known access pattern, you could take it into consideration when designing your table schema. Instead of storing all items in a single table, you could use multiple tables to store these items. For example, you could create tables to store monthly or weekly data. For the table storing data from the latest month or week, where data access rate is high, request higher throughput and for tables storing older data, you could dial down the throughput and save on resources.
You can save on resources by storing "hot" items in one table with higher throughput settings, and "cold" items in another table with lower throughput settings. You can remove old items by simply deleting the tables. You can optionally backup these tables to other storage options such as Amazon Simple Storage Service (Amazon S3). Deleting an entire table is significantly more efficient than removing items one-by-one, which essentially doubles the write throughput as you do as many delete operations as put operations.
来源:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html
关于java - DynamoDB 扫描查询和 BatchGet,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29674951/
我使用以下代码调用 Analytics Reporting API V4 batchGet 方法: gapi.client.analyticsreporting.reports.batchGet( {
我们有一个 Dynamo DB 表结构,其中包含 Hash 和 Range 作为主键。 Hash = date.random_number Range = timestamp 如何获取 X 和 Y 时
在制作 report:batchGet 时给报告命名的最佳解决方案是什么? var url = Google.Common.Configuration.Endpoints.ReportBatchGet
我想根据一些过滤器从 dynamodb 表中检索项目列表。在过滤器中,我有哈希键列表(应返回具有该哈希键之一的记录)以及记录上的更多过滤器,例如值为“已批准”的“状态”字段。因此,如果该项目具有我的列
无法让 Google Sheets API v4 使用 batchGet 返回多个范围值。它给出了以下错误(尽管文档说它需要 valueRanges[],但所有范围和电子表格 ID 都是正确的):“额
所以我现在被 dynamo db 的 batchGet 操作困住了一段时间。 这是我的表定义(在无服务器中) Resources: MyTable: Type: AWS::DynamoDB
我是一名优秀的程序员,十分优秀!