gpt4 book ai didi

amazon-web-services - 如何按日期(范围键)查询 DynamoDB,没有明显的哈希键?

转载 作者:行者123 更新时间:2023-12-03 08:56:27 27 4
gpt4 key购买 nike

我需要使 iOS 应用程序上的本地数据与 DynamoDB 表中的数据保持同步。 DynamoDB 表大约有 2K 行,只有一个哈希键 ( id ) 和以下属性:

  • id (uuid)
  • lastModifiedAt (时间戳)
  • name
  • latitude
  • longitude

  • 我目前正在通过 lastModifiedAt 进行扫描和过滤,其中 lastModifiedAt大于应用程序的上次刷新日期,但我认为这会变得昂贵。

    最好的 answer我可以找到的是添加一个全局二级索引 lastModifiedAt作为范围,但没有明显的 GSI 哈希键。

    当需要使用 GSI 按范围查询但没有明显的哈希键时,最佳实践是什么? 或者,如果完整扫描是唯一的选择,是否有任何最佳做法可以降低成本?

    最佳答案

    虽然 Global Secondary Index似乎符合您的要求,任何尝试包括 timestamp相关信息作为您的一部分 Hash Key很可能会创建所谓的“热分区”,这是非常不受欢迎的。

    不均匀的访问将会发生,因为最近的项目将比旧项目更频繁地被检索。这不仅会影响您的性能,还会降低您的解决方案的成本效益。

    查看文档中的一些详细信息:

    For example, if a table has a very small number of heavily accessed partition key values, possibly even a single very heavily used partition key value, request traffic is concentrated on a small number of partitions – potentially only one partition. If the workload is heavily unbalanced, meaning that it is disproportionately focused on one or a few partitions, the requests will not achieve the overall provisioned throughput level. To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.



    根据声明, id看来确实是您的不错选择 Hash Key (又名。 Partition Key ),我不会改变它,因为 GSI key 的工作方式与分区相同。作为单独的说明,当您通过提供整个 Primary Key 检索数据时,性能得到了高度优化。 ,所以我们应该尝试找到一个解决方案,只要有可能就可以提供。

    我建议创建单独的表来根据主键的更新时间来存储主键。您可以根据最适合您的用例的粒度将数据分段到表中。例如,假设您要按天分割更新:

    一种。您的每日更新可以存储在具有以下命名约定的表中: updates_DDMM
    updates_DDMM表将只有 id 's(另一个表的哈希键)

    现在假设最新的应用程序刷新日期是 2 天前 (04/07/16) 并且您需要获取最近的记录,那么您需要:

    一世。扫表 updates_0504updates_0604获取所有哈希键。

    ii.最后通过提交 BatchGetItem 从主表(包含纬度/经度、名称等)中获取记录。与所有获得的哈希键。
    BatchGetItem速度超快,可以像其他操作一样完成这项工作。

    有人可能会争辩说,创建额外的表会增加整体解决方案的成本......好吧,使用 GSI您实际上是在复制您的表格(如果您要投影所有字段)并为所有 ~2k 记录添加额外成本,无论它们最近是否更新...

    创建这样的表似乎违反直觉,但实际上是处理时间序列数据时的最佳实践(来自 AWS DynamoDB 文档):

    [...] the applications might show uneven access pattern across all the items in the table where the latest customer data is more relevant and your application might access the latest items more frequently and as time passes these items are less accessed, eventually the older items are rarely accessed. If this is a known access pattern, you could take it into consideration when designing your table schema. Instead of storing all items in a single table, you could use multiple tables to store these items. For example, you could create tables to store monthly or weekly data. For the table storing data from the latest month or week, where data access rate is high, request higher throughput and for tables storing older data, you could dial down the throughput and save on resources.

    You can save on resources by storing "hot" items in one table with higher throughput settings, and "cold" items in another table with lower throughput settings. You can remove old items by simply deleting the tables. You can optionally backup these tables to other storage options such as Amazon Simple Storage Service (Amazon S3). Deleting an entire table is significantly more efficient than removing items one-by-one, which essentially doubles the write throughput as you do as many delete operations as put operations.



    来源:
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html

    我希望这有帮助。问候。

    关于amazon-web-services - 如何按日期(范围键)查询 DynamoDB,没有明显的哈希键?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35963243/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com