gpt4 book ai didi

How to keep database and object store consistent to avoid orphan objects?(如何保持数据库和对象存储的一致性,以避免孤立对象?)

转载 作者:bug小助手 更新时间:2023-10-28 10:08:23 26 4
gpt4 key购买 nike



I am writing an online text editor. I want to allow users to add inline images and video to the document. I am struggling to implement this in a reliable way.

我正在写一个在线文本编辑器。我想让用户添加内联图像和视频到文档。我正在努力以一种可靠的方式实施这一点。



Current infrastructure:

现有基础设施:




  • Database (postgres) of documents (text, title, author, list of media objects referencing S3)

  • Object store (S3) where the images/video/files are stored



The current flow:

当前的流程:




  1. User creates a new document

  2. User makes changes, but doesn't save it. These changes are stored in localStorage so they are not lost on refresh.

  3. The user attaches an image

  4. The image displays a loading indicator as it is uploaded to S3 (or equivalent)

  5. The user saves the document, and the data is saved to a database. The objects are not saved, only S3 URLs to them.



Problem

问题




  • If the user deletes the document before saving, or if saving fails, there will be orphan files in S3 that are not referenced by any documents.

  • A "delete document" action must now delete something from Postgres and S3. Since you cannot do a transaction across two completely different services, one can imagine a situation where the postgres delete succeeds, but the S3 delete fails, creating more orphan objects.



Attempts at solutions

尝试解决方案




  • I tried storing the media in localStorage and committing them all when the document is saved. This would solve the issue, but localStorage is limited to 5-10mb, which is too small.

  • A reaper daemon that queries references to S3 in the database and cross-references it with objects stored in S3 to find orphan objects, which it would automatically delete.



The reaper daemon would work, but it feels like a hack. I really don't want to manage an entirely new service just to store some files. Is there a better way to do this? What is the industry standard?

收割机守护程序可以工作,但感觉像是黑客。我真的不想仅仅为了存储一些文件而管理一项全新的服务。有没有更好的方法来做这件事?行业标准是什么?



If it matters, I'm using React+Typescript and the text editor is built upon DraftJS.

如果重要的话,我使用React+Typescript,文本编辑器是基于DraftJS构建的。


更多回答

Side-comment: I saw an interesting adaptation of this in canva.com -- I provided an image, but it was immediately available for use locally, even while the picture was still uploading to the back-end server. Lots of interesting hacks!

附注:我在canva.com上看到了一个有趣的改编--我提供了一张图片,但它立即可以在本地使用,即使照片还在上传到后端服务器上。很多有趣的黑客!

@JohnRotenstein you led me to discover the HTML5 FileSystem API, which looks like it might be able to replace the localStorage idea. I'll answer this question if this works!

@JohnRostein您带领我发现了HTML5文件系统API,它看起来可能能够取代本地存储的想法。如果这个有效的话,我会回答这个问题的!

优秀答案推荐

Here's the solution to the core problem of keeping the database and the object store consistent.

下面是保持数据库和对象存储一致的核心问题的解决方案。


First, a couple of general rules:

首先,有几个一般规则:



  1. The database is the source of truth. If the object store disagrees with the database, the object store is wrong.

  2. Distributed consistency is easy as long as facts are only ever created, never deleted. See the Keeping CALM paper.


The database stores the following information about the objects:

数据库存储有关对象的以下信息:



  1. A unique ID. (Not a hash: two uploads of the same file must get two different IDs. Deduplicating objects via content-addressing is out of scope.)

  2. Upload timestamp. This is set after uploading the object to the object store under the object's ID.

  3. Deletion timestamp. This is set before deleting the object from the object store.


The timestamps are optional but immutable once set.

时间戳是可选的,但一旦设置就不会改变。


The object can be used for as long as it has an upload timestamp and doesn't have a deletion timestamp.

只要对象有上传时间戳而没有删除时间戳,就可以使用它。


It effectively goes through the following states:

它实际上经历了以下状态:



  1. Database: doesn't exist. Object Store: doesn't exist.

  2. DB: assigned ID. OS: doesn't exist.

  3. DB: assigned ID. OS: exists.

  4. DB: uploaded. OS: exists.

  5. DB: deleted. OS: exists.

  6. DB: deleted. OS: doesn't exist.

  7. DB: doesn't exist. OS: doesn't exist.


The application needs to perform two operations here: creating an object and deleting an object. Both are idempotent.

应用程序需要在这里执行两个操作:创建对象和删除对象。两者都是幂等的。


Creation:

创作:



  1. Assign ID.

  2. Upload object to object store.

  3. Add an upload timestamp.


Deletion:

删除:



  1. Add a download timestamp.

  2. Delete object from object store.

  3. Delete the record from the database.


Two destructive updates happen here:

这里发生了两个破坏性更新:



  1. the object is deleted from the object store. This is pure cleanup, as the database already considers the object to be deleted.

  2. the record is deleted from the database. At this point the system is no longer distributed as the object store no longer knows about the object.


Finally, do lightweight sweeping periodically to clean up failed operations:

最后,定期进行轻量级清理,清理失败的操作:



  1. Look for objects that have been created a long time ago. (Needs an extra timestamp set when the ID is assigned.)

  2. Mark them as both created and deleted. (This is a valid operation regardless of whether they've been saved to the object store.)

  3. Look for objects that have been marked as deleted a long time ago.

  4. Perform the full deletion operation on them again.


更多回答

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com