gpt4 book ai didi

python - 使用Vicuna + langchain + llama_index 创建自托管LLM模型

转载 作者:行者123 更新时间:2023-12-02 05:48:09 28 4
gpt4 key购买 nike

我想创建一个自托管的 LLM 模型,该模型将能够拥有我自己的自定义数据的上下文(就此而言,Slack 对话)。

我听说 Vicuna 是 ChatGPT 的一个很好的替代品,所以我编写了以下代码:

from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex, \
GPTSimpleVectorIndex, PromptHelper, LLMPredictor, Document, ServiceContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
import torch
from langchain.llms.base import LLM
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

!export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

class CustomLLM(LLM):
model_name = "eachadea/vicuna-13b-1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

pipeline = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0,
model_kwargs={"torch_dtype":torch.bfloat16})

def _call(self, prompt, stop=None):
return self.pipeline(prompt, max_length=9999)[0]["generated_text"]

def _identifying_params(self):
return {"name_of_model": self.model_name}

def _llm_type(self):
return "custom"


llm_predictor = LLMPredictor(llm=CustomLLM())

但遗憾的是我遇到了以下错误:

OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB (GPU 0; 22.03 GiB total capacity; 21.65 GiB 
already allocated; 94.88 MiB free; 21.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated
memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and
PYTORCH_CUDA_ALLOC_CONF

这是 !nvidia-smi 的输出(在运行任何内容之前):

Thu Apr 20 18:04:00 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10G Off| 00000000:00:1E.0 Off | 0 |
| 0% 23C P0 52W / 300W| 0MiB / 23028MiB | 18% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

知道如何修改我的代码以使其正常工作吗?

最佳答案

长度太长,9999会消耗大量的GPU RAM,特别是使用13b模型。尝试7b型号。并尝试使用 peft/bitsandbytes 之类的东西来减少 GPU RAM 使用。设置 load_in_8bit=True 是一个好的开始。

关于python - 使用Vicuna + langchain + llama_index 创建自托管LLM模型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76067104/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com