gpt4 book ai didi

python - 我可以让 SQLAlchemy 在不重复完整的原始查询的情况下进行子查询预加载吗?

转载 作者:太空狗 更新时间:2023-10-29 20:24:11 24 4
gpt4 key购买 nike

假设我们有这样的原始生成查询:

SELECT company.x AS company_x, ...
FROM company
LEFT OUTER JOIN acc ON acc.id = company.acc
LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid
LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid
WHERE usergro_links.eid = %s OR usercomp_links.eid = %s

如果我们向其中添加 .options(subqueryload(Company.childs)),我们将得到:

SELECT company.x AS company_x, ..., anon_1.company_id AS anon_1_company_id
FROM (
SELECT company.id AS company_id
FROM company
LEFT OUTER JOIN acc ON acc.id = company.acc
LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid
LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid
WHERE usergro_links.eid = %s OR usercomp_links.eid = %s) AS anon_1
INNER JOIN acel_links AS acel_links_1 ON anon_1.company_id = acel_links_1.eid
INNER JOIN company ON company.id = acel_links_1.pid ORDER BY anon_1.company_id

这很慢。如果我将从第一个查询中获取公司 ID,并手动加载所有子公司,与我们在这种情况下获得的相比,它将快得惊人。

我已经阅读了文档,查看了代码,但不知道我是否可以告诉 sqlalchemy 只从第一个查询的结果中获取 ID,并在单独的、相对简单的查询中加载子项。我不依赖这个示例——当 sqlalchemy 无法加载构造的查询时,我遇到了更多更困难的情况。为什么要从第一个查询开始再做所有这些工作?

所以有人知道如何在没有自动构造的“从连接到连接”样式的情况下进行预加载吗?

最佳答案

更新:“select in”策略现已在 SQLAlchemy 中实现(自 v 1.2 起):参见 Select IN loading在文档中。

TLDR:

我认为应该尽可能使用joinedload策略,因为它比其他策略更有效,包括问题策略中建议的使用“IN”语句加载相关数据。

“IN”策略可以很容易地在 SQLAlchemy 的“外部”实现(参见下面的代码),并且作为新的加载策略实现它可能不应该很复杂(因为逻辑上它类似于现有的 subqueryload 策略)。

完整版:

我从一个简单的实验开始,看看不同策略产生的查询

实验的完整源代码是on Github .

我的模型是这样的:

class Author(ModelBase):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(255))


class Book(ModelBase):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
name = Column(String)
author_id = Column(Integer, ForeignKey('authors.id'))
author = relationship(
'Author', backref=backref('books'))

现在,测试,首先是惰性加载:

books = session.query(Book).all()
print books[0].author.name
session.commit()

输出(清理后):

-------------Lazy--------------
sqlalchemy.engine.base.Engine:
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT
authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(1,)
author1

正如预期的那样,延迟加载会在每次访问一位作者时运行一个查询来获取书籍和一个查询。

子查询加载:

books = session.query(Book).options(subqueryload(Book.author)).all()
print books[0].author.name
session.commit()

-------------Subquery----------
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT
authors.id AS authors_id, authors.name AS authors_name,
anon_1.books_author_id AS anon_1_books_author_id
FROM (
SELECT DISTINCT books.author_id AS books_author_id
FROM books) AS anon_1
JOIN authors
ON authors.id = anon_1.books_author_id
ORDER BY anon_1.books_author_id
author1

对于子查询,我们有两个查询,一个是获取书籍,另一个是使用子查询获取作者。

加入加载:

books = session.query(Book).options(joinedload(Book.author)).all()
print books[0].author.name
session.commit()

-------------Joined------------
SELECT
books.id AS books_id, books.name AS books_name,
books.author_id AS books_author_id,
authors_1.id AS authors_1_id, authors_1.name AS authors_1_name
FROM books
LEFT OUTER JOIN authors AS authors_1 ON authors_1.id = books.author_id
author1

联合策略只运行一个查询来获取书籍和作者。

立即加载:

books = session.query(Book).options(immediateload(Book.author)).all()
print books[0].author.name
session.commit()

-------------Immediate---------
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT
authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(1,)

SELECT authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(2,)

author1

immediate 策略使用第一个查询加载书籍,然后,当我们尝试访问关系时,通过对每个相关记录的单独查询获取所有相关数据。

看起来“joinedload()”在大多数情况下应该是最有效的(amd 比“IN”策略更有效)——我们只需通过一次查询获取所有数据。

现在,让我们尝试在 SQL Alchemy 之外实现 IN 策略:

print '-------------IN----------------'
books = session.query(Book).all()
ids = set()
for b in books:
ids.add(b.author_id)
authors = session.query(Author).filter(Author.id.in_(ids)).all()
print books[0].author.name
print books[1].author.name
print books[2].author.name
print books[3].author.name

输出:

-------------IN----------------
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books

SELECT authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id IN (?, ?)
INFO:sqlalchemy.engine.base.Engine:(1, 2)

author1
author1
author2
author2

正如我们所见,它运行了两个查询,然后我们可以访问所有作者。

请注意,我们没有明确地将作者加入书籍,但当我们尝试通过书籍访问作者时它仍然有效,因为 SQLAlchemy 在内部身份映射中找到作者记录并且不运行额外的数据库查询。

类似于上面的“IN”策略代码可以概括为可以与任何模型/关系一起使用的函数。也许,“IN”策略作为一个新的 SQLAlchemy 策略应该相对容易实现,它类似于现有的 subqueryloading - 它也应该运行第二个查询来获取相关数据。

关于python - 我可以让 SQLAlchemy 在不重复完整的原始查询的情况下进行子查询预加载吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26704076/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com