gpt4 book ai didi

django - 在 Django restframework 中使用 python async/await

转载 作者:行者123 更新时间:2023-12-04 11:43:14 38 4
gpt4 key购买 nike

我只是将一个旧项目升级到 Python 3.6,并发现有这些很酷的新 async/await 关键字。

我的项目包含一个网络爬虫,目前性能不是很好,大约需要 7 分钟才能完成。
现在,由于我已经安装了 django restframework 来访问我的 django 应用程序的数据,我认为拥有一个 REST 端点会很好,我可以通过一个简单的 POST 请求从远程启动爬虫。

但是,我不希望客户端同步等待爬虫完成。我只想直接给他发爬虫已经启动的消息,在后台启动爬虫。

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
await tasks.update_all(deep_crawl, season, log_to_db)


@api_view(['POST', 'GET'])
def start(request):
"""
Start crawling.
"""
if request.method == 'POST':
print("Crawler: start {}".format(request))

deep = request.data.get('deep', False)
season = request.data.get('season', settings.CURRENT_SEASON)

# this should be called async
update_all_async(season=season, deep_crawl=deep)

return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
else:
return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
"deep": "boolean",
"season": "number"
}}, status.HTTP_200_OK)

我已经阅读了一些教程,以及如何使用循环和其他东西,但我真的不明白......在这种情况下我应该从哪里开始循环?

[编辑] 2017 年 10 月 10 日:

我现在使用线程解决了它,因为它确实是一个“即发即忘”的任务。但是,我仍然想知道如何使用 async/await 实现相同的目标。

这是我目前的解决方案:
import threading


@api_view(['POST', 'GET'])
def start(request):
...
t = threading.Thread(target=tasks.update_all, args=(deep, season))
t.start()
...

最佳答案

这在 Django 3.1+ 中是可能的,在 introducing asynchronous support 之后.
关于异步运行循环,你可以通过 uvicorn 运行 Django 来使用它。或任何其他 ASGI 服务器而不是 gunicorn或其他 WSGI 服务器。
不同之处在于,在使用 ASGI 服务器时,已经有一个运行循环,而在使用 WSGI 时需要创建一个。使用ASGI,您可以简单地定义async直接在 views.py 下的功能或其 View 类的继承函数。
假设您使用 ASGI,您有多种方法可以实现这一点,我将描述一些(例如其他选项可以使用 asyncio.Queue ):

  • 制作 start()异步

  • 通过制作 start() async,您可以直接使用现有的运行循环,并通过使用 asyncio.Task ,您可以触发并忘记进入现有的运行循环。如果你想开火但记住,你可以创建另一个 Task跟进这一点,即:
    from rest_framework import status
    from rest_framework.decorators import api_view
    from rest_framework.response import Response
    from django.conf import settings
    from mycrawler import tasks

    import asyncio

    async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    await tasks.update_all(deep_crawl, season, log_to_db)

    async def follow_up_task(task: asyncio.Task):
    await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
    if task.done():
    print('update_all task completed: {}'.format(task.result()))
    else:
    print('task not completed after 5 seconds, aborting')
    task.cancel()


    @api_view(['POST', 'GET'])
    async def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
    print("Crawler: start {}".format(request))

    deep = request.data.get('deep', False)
    season = request.data.get('season', settings.CURRENT_SEASON)

    # Once the task is created, it will begin running in parallel
    loop = asyncio.get_running_loop()
    task = loop.create_task(update_all_async(season=season, deep_crawl=deep))

    # Fire up a task to track previous down
    loop.create_task(follow_up_task(task))

    return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
    return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
    "deep": "boolean",
    "season": "number"
    }}, status.HTTP_200_OK)
  • async_to_sync

  • 有时您不能只拥有一个 async首先将请求路由到的函数, as it happens with DRF (截至今日)。
    为此,Django 提供了一些有用的 async adapter functions ,但请注意,从同步上下文切换到异步上下文,反之亦然,随 a small performance penalty 一起提供。大约 1ms。请注意,这一次,运行循环收集在 update_all_sync 中。函数代替:
    from rest_framework import status
    from rest_framework.decorators import api_view
    from rest_framework.response import Response
    from django.conf import settings
    from mycrawler import tasks

    import asyncio
    from asgiref.sync import async_to_sync

    @async_to_sync
    async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    #We can use the running loop here in this use case
    loop = asyncio.get_running_loop()
    task = loop.create_task(tasks.update_all(deep_crawl, season, log_to_db))
    loop.create_task(follow_up_task(task))

    async def follow_up_task(task: asyncio.Task):
    await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
    if task.done():
    print('update_all task completed: {}'.format(task.result()))
    else:
    print('task not completed after 5 seconds, aborting')
    task.cancel()


    @api_view(['POST', 'GET'])
    def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
    print("Crawler: start {}".format(request))

    deep = request.data.get('deep', False)
    season = request.data.get('season', settings.CURRENT_SEASON)

    # Make update all "sync"
    sync_update_all_sync = async_to_sync(update_all_async)
    sync_update_all_sync(season=season, deep_crawl=deep)

    return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
    return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
    "deep": "boolean",
    "season": "number"
    }}, status.HTTP_200_OK)
    在这两种情况下,该函数都会快速返回 200,但从技术上讲,第二个选项更慢。

    IMPORTANT: When using Django, it is common to have DB operations involved in these async operations. DB operations in Django can only be synchronous, at least for now, so you will have to consider this in asynchronous contexts. sync_to_async() becomes very handy for these cases.

    关于django - 在 Django restframework 中使用 python async/await,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46820009/

    38 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com