amazon-web-services - 数据库中的用户提要(可能是 DynamoDb)-6ren

amazon-web-services - 数据库中的用户提要(可能是 DynamoDb)

转载作者：行者123 更新时间：2023-12-04 08:05:59

24

4

关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。

想改进这个问题？将问题更新为 on-topic对于堆栈溢出。

4年前关闭。

Improve this question

我正在考虑使用 DynamoDB 生成用户提要。
我将使用 PostId(范围)存储 UserId(哈希)。但我只需要在数据库中保留最后 3000 个帖子。所以我正在考虑有一个后台任务来清理 table 。
这是合理的做法吗？我不确定这种类型的范围查询是否会运行得相当快，因为我有大约 2500 万条用户记录。

请建议任何其他可能有效的选项(redis 中的 fanout 除外)。

最佳答案

您的案例是一个典型的时间序列数据场景，您的记录会随着时间的推移而过时。您需要注意两个主要因素:

确保您的表具有均匀的访问模式

如果您将所有帖子放在一个表中并且更频繁地访问最近的帖子，则您的预置吞吐量将无法有效使用。
您应该将访问次数最多的项目分组到一个表中，以便可以针对所需访问正确调整预置吞吐量。此外，请确保您正确定义了 Hash Key that will allow even distribution of your data across multiple partitions .

以最有效的方式(努力、性能和成本方面)删除过时的数据

该文档建议将数据分段到不同的表中，以便在记录过时时删除或备份整个表(请参阅下面的更多详细信息)。

例如，您可以按月份对表格进行分段:

Posts_April, Posts_May, etc

或按计数，每个表包含最大记录数:

Posts_1, Posts_2, Posts_3, etc

在这种情况下，一旦当前表达到最大记录数，您将创建一个新表，并在需要进行清理时删除/备份最旧的表。

我可能需要一些关于您的用例的额外信息，以便为您提供更好的示例来说明如何利用这种方法。

在下面查找以编程方式创建和删除表所需的操作的一些引用:

创建表
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html

删除表
http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteTable.html

以下是文档中解释与时间序列数据相关的最佳实践的部分:

Understand Access Patterns for Time Series Data

For each table that you create, you specify the throughput requirements. DynamoDB allocates and reserves resources to handle your throughput requirements with sustained low latency. When you design your application and tables, you should consider your application's access pattern to make the most efficient use of your table's resources.

Suppose you design a table to track customer behavior on your site, such as URLs that they click. You might design the table with hash and range type primary key with Customer ID as the hash attribute and date/time as the range attribute. In this application, customer data grows indefinitely over time; however, the applications might show uneven access pattern across all the items in the table where the latest customer data is more relevant and your application might access the latest items more frequently and as time passes these items are less accessed, eventually the older items are rarely accessed. If this is a known access pattern, you could take it into consideration when designing your table schema. Instead of storing all items in a single table, you could use multiple tables to store these items. For example, you could create tables to store monthly or weekly data. For the table storing data from the latest month or week, where data access rate is high, request higher throughput and for tables storing older data, you could dial down the throughput and save on resources.

You can save on resources by storing "hot" items in one table with higher throughput settings, and "cold" items in another table with lower throughput settings. You can remove old items by simply deleting the tables. You can optionally backup these tables to other storage options such as Amazon Simple Storage Service (Amazon S3). Deleting an entire table is significantly more efficient than removing items one-by-one, which essentially doubles the write throughput as you do as many delete operations as put operations.

来源:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.TimeSeriesDataAccessPatterns

基于其他评论的更新答案:

“所以用户 ID 将是我的哈希键。我需要的是清理程序......所以显然基于日期单独表的方法将不起作用，因为数据不是按时间范围过滤而是按计数过滤。在其他工作中我需要有每个用户有 x 条最近记录。为了让它超过 x 量，我需要清理过程。

在这种情况下，您几乎可以定义 Hash Key作为 UserId和 PostId作为 Range Key .

如果每个用户最多可以有 10 个帖子，那么 Range Key最大值将为 10。当您达到最大数量并且用户添加新帖子时，您会从 1 重新开始，自动替换该用户的最旧帖子(有关更多详细信息，请参阅 DynamoDB PutItem 操作)。最后，您只是为每个用户创建一个循环帖子列表。

通过这样做，您实际上是在添加新帖子并通过单个写入操作立即执行清理过程。

您可能需要创建一个包含最后 PostId 的支持表。由每个 User 发布.如果您选择仅将哈希键定义为 UserId ，您将能够使用 GetItem 查找特定用户的最后一个 PostId操作(非常便宜且快速)。该表的架构可能很简单:
UserId ( Hash Key)
LastPostId (数字属性) - 不是范围键

例如，假设您需要从 UserId 获取最后三个最新帖子。 = ABC :

第1步。 使用 GetItem在 LastPostIds_Table提供用户 ID (Hash Key) = "ABC"
如果 LastPostId = 4然后

第2步。 使用 BatchGetItem在 Posts_Table使用 UserId (Hash Key) = "ABC" 获取记录和 PostId (Range Key) = 4, 3 and 2 .

来自返回的 PostId s 你会知道 4 是最新的，2 是最旧的。

警告 : 使用 BatchGetItem返回多条记录 may cause sudden bursts of reading activity .只需将读取操作分成几个较小的批处理即可轻松解决此问题。
PutItem有助于实现 Post 持久性逻辑:

PutItem Creates a new item, or replaces an old item with a new item. If an item that has the same primary key as the new item already exists in the specified table, the new item completely replaces the existing item. You can perform a conditional put operation (add a new item if one with the specified primary key doesn't exist), or replace an existing item if it has certain attribute values.

来源: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html

关于amazon-web-services - 数据库中的用户提要(可能是 DynamoDb)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29785925/

24

4

0

文章推荐： haskell - 为什么这个等效程序无法编译？

数据库
我的问题是如何在 python 中创建一个简单的数据库。我的例子是: User = { 'Name' : {'Firstname', 'Lastname'}, 'Address' : {'Street
mysql - iOS开发。数据库？数据库？什么是最好的方法？
我需要创建一个与远程数据库链接的应用程序! mysql 是最好的解决方案吗？ Sqlite 是唯一的本地解决方案吗？我使用下面的方法，我想知道它是否是最好的方法! NSString *evento
java 应用程序无法连接到远程 MySQL 数据库，但可以连接到本地 MySQL 数据库
给定两台 MySQL 服务器，一台本地，一台远程。两者都有一个包含表 bohica 的数据库 foobar。本地服务器定义了用户 'myadmin'@'%' 和 'myadmin'@'localhos
java - 灵活查询适用于 HANA 数据库，但不适用于 HSQL 数据库
我有以下灵活的搜索查询 Select {vt:code},{vt:productcode},{vw:code},{vw:productcode} from {abcd AS vt JOIN wxyz
mysql - 从原始数据文件恢复 MySQL 数据库 [XAMPP | MySQL |数据库]
好吧，我的电脑开始运行有点缓慢，所以我重置了 Windows，保留了我的文件。因为我的大脑还没有打开，所以我忘记事先备份我的 MySQL 数据库。我仍然拥有所有原始文件，因此我实际上仍然拥有数据库，但
android - 如何将我的 Access 数据库 (.accdb) 转换为 SQLite 数据库 (.sqlite)？
如何将我的 Access 数据库 (.accdb) 转换为 SQLite 数据库 (.sqlite)？请，任何帮助将不胜感激。最佳答案 1)如果要转换 db 的结构，则应使用任何 DB 建模工具:
django - 实际上我将我的 django 数据库 sqlite3 连接到 Mysql 数据库，每当我迁移时我都会收到此错误
系统检查发现了一些问题: 警告:？:(mysql.W002)未为数据库连接“默认”设置 MySQL 严格模式提示:MySQL 的严格模式通过将警告升级为错误来修复 MySQL 中的许多数据完整性问题
django - 实际上我将我的 django 数据库 sqlite3 连接到 Mysql 数据库，每当我迁移时我都会收到此错误
系统检查发现了一些问题: 警告:？:(mysql.W002)未为数据库连接“默认”设置 MySQL 严格模式提示:MySQL 的严格模式通过将警告升级为错误来修复 MySQL 中的许多数据完整性问题
android - 如何在 phonegap 数据库中使用 android 数据库/作为 phonegap 数据库
我想在相同的 phonegap 应用程序中使用 android 数据库。更多说明: 我创建了 phonegap 应用程序，但 phonegap 应用程序不支持服务，所以我们已经在 java 中为 a
javascript - 将日期插入 mysql 数据库 [我正在使用 php 和 xampp mysql 数据库]
Time Tracker function clock() { var mytime = new Date(); var seconds
php - MySQL如何从年份(参数)、weekOfYear(参数)、时间(数据库)和dayofweek(数据库)创建时间戳？
我需要在现有项目上实现一些事件的显示。我无法更改数据库结构。在我的 Controller 中，我(从 ajax 请求)传递了一个时间戳，并且我需要显示之前的 8 个事件。因此，如果时间戳是(转换后)
performance - : {REST API, 网站} --> {数据库} 或 {网站} --> {REST API} --> {数据库} 哪个更好？
我有一个可以收集和显示各种测量值的产品(不会详细介绍)。正如人们所期望的那样，显示部分是一个数据库+建立在其之上的网站(使用 Symfony)。但是，我们可能还会创建一个 API 来向第三方公开数据
sql-server - Azure SQL 数据库 - 查询速度明显慢于 Azure VM 上的 SQL 数据库
我们将 SQL Server 从 Azure VM 迁移到 Azure SQL 数据库。 Azure VM 为 DS2_V2、2 核、7GB RAM、最大 6400 IOPS Azure SQL 数据
java - MongoDB 如何在 Java 本地测试 MongoDB 数据库，比如 H2 和 sql 数据库？
我正在开发一个使用 MongoDB 数据库的程序，但我想问在通过 Java 执行 SQL 时是否可以使用内部数据库进行测试，例如 H2？最佳答案你可以尝试使用Testcontainers Test
sql - 如何从 unix 终端连接到 Microsoft SQL Server 数据库？我必须连接 SQL Server 2008 数据库
已关闭。此问题不符合Stack Overflow guidelines 。目前不接受答案。已关闭 9 年前。此问题似乎与 a specific programming problem, a sof
python - 尝试使用 MSI 身份验证从 Azure ML 服务连接 Azure SQL 数据库(无需用户名和密码即可连接 Azure 数据库)
我正在尝试使用 MSI 身份验证(无需用户名和密码)从 Azure 机器学习服务连接 Azure SQL 数据库。我正在尝试在 Azure 机器学习服务上建立机器学习模型，目的是我需要数据，这就是我
数据库；空场似乎不空
我在我的 MySQL 数据库中使用这个查询来查找 my_column 不为空的所有行: SELECT * FROM my_table WHERE my_column != ""; 不幸的是，许多行在
数据库 |选择不同的记录
我有那个基地:http://sqlfiddle.com/#!2/e5a24/2这是 WordPress 默认模式的简写。我已经删除了该示例不需要的字段。如您所见，我的结果是“类别 1”的两倍。我喜欢
数据库。提取过滤列的数据
我有一张这样的 table : mysql> select * from users; +--------+----------+------------+-----------+ | userid
数据库 |高级分面搜索
我有表: CREATE TABLE IF NOT EXISTS `category` ( `id` int(11) NOT NULL, `name` varchar(255) NOT NULL

首页

博学

6Ren·AI

商城

amazon-web-services - 数据库中的用户提要(可能是 DynamoDb)