To speed up cold starts of my AWS Lambda function, I tried to include the HuggingFace transformer model that the function uses into the container by writing the following in the Dockerfile:
为了加快AWS Lambda函数的冷启动速度,我尝试通过在Dockerfile中写入以下代码,将该函数使用的HuggingFace转换器模型包含到容器中:
ENV HF_HOME=${LAMBDA_TASK_ROOT}/huggingface
ENV TRANSFORMERS_CACHE=${LAMBDA_TASK_ROOT}
ENV XDG_CACHE_HOME=${LAMBDA_TASK_ROOT}
COPY .cache/torch ${LAMBDA_TASK_ROOT}/torch
The function works and CloudWatch logs indicate that the model is not being downloaded. However, the logs contain the error:
该功能起作用,CloudWatch日志表明该模型未被下载。但是,日志包含以下错误:
There was a problem when trying to write in your cache folder (/var/task). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
Even thought the function works, I cannot ignored the error, because the initialization stage during a cold start is still slow and I am being charged for it. The latter fact indicates that a lot of work is being done to initialize the model and that is precisely what I am trying to avoid.
即使函数可以工作,我也不能忽略这个错误,因为冷启动期间的初始化阶段仍然很慢,而且我要为此收费。后一个事实表明,正在做大量工作来初始化模型,这正是我试图避免的。
When I set the environment variables to point to /tmp and copy the model there, the model is downloaded during the cold start. It looks like /tmp is cleaned before the cold start.
当我将环境变量设置为指向/tmp并将模型复制到那里时,该模型将在冷启动期间下载。看起来像是在冷启动前清洗过了。
So, what is the correct way to include the model in the container to make the billed initialization phase during the cold start fast?
那么,将模型包含在容器中的正确方式是什么,以使冷启动期间的计费初始化阶段更快呢?
P.S. The question at AWS Re:Post: https://repost.aws/questions/QUswOf0xO2SAieaaPf9DZ_DQ/including-a-transformer-model-in-the-container-for-aws-lambda
附注:AWS Re的问题:帖子:https://repost.aws/questions/QUswOf0xO2SAieaaPf9DZ_DQ/including-a-transformer-model-in-the-container-for-aws-lambda
更多回答
优秀答案推荐
My current solution is to copy the model from ${LAMBDA_TASK_ROOT}
to /tmp
in the initialization part of the Lambda function itself. I wonder whether there is a more performant solution.
我目前的解决方案是将模型从${LAMBDA_TASK_ROOT}复制到Lambda函数本身的初始化部分中的/tmp。我想知道是否有更好的解决方案。
更多回答
我是一名优秀的程序员,十分优秀!