C#使用词嵌入向量与向量数据库为大语言模型(LLM)赋能长期记忆实现私域问答机器人落地之openai接口平替-6ren

C#使用词嵌入向量与向量数据库为大语言模型(LLM)赋能长期记忆实现私域问答机器人落地之openai接口平替

转载作者：我是一只小鸟更新时间：2023-05-25 14:31:27

------------恢复内容开始------------ 。

在上一篇文章中我们大致讲述了一下如何通过词嵌入向量的方式为大语言模型增加长期记忆，用于落地在私域场景的问题。其中涉及到使用openai的接口进行词嵌入向量的生成以及chat模型的调用。

由于众所周知的原因，国内调用openai接口并不友好，所以今天介绍两款开源平替实现分别替代词嵌入向量和文本生成.

照例还是简单绘制一下拓扑图:

从拓扑上来看还是比较简单的，一个后端服务用于业务处理，两个AI模型服务用于词嵌入向量和文本生成以及一个向量数据库(这里依然采用es，下同),接着我们来看看流程图:

从流程图上来讲，我们依然需要有两个阶段的准备，在一阶段，我们需要构建私域回答的文本，这些文本往往以字符串的形式被输入到嵌入接口，然后获取到嵌入接口的嵌入向量。再以es索引的方式被写入到向量库。而在第二阶段，也就是对外提供服务的阶段，我们会将用户的问题调用嵌入接口生成它的词嵌入向量，然后通过向量数据库的文本相似度匹配获取到近似的回答，比如提问“青椒炒肉时我的盐应该放多少”。向量库相似的文本里如果包含了和该烹饪有关的文本会返回1到多条回答。接着我们在后端构建一个prompt，和之前的文章类似。最后调用我们的文本生成模型进行问题的回答。整个流程结束.

接下来我们看看如何使用和部署这些模型以及c#相关代码的编写。

重要：在开始之前，请确保你的部署环境安装了16G显存的Nvidia显卡或者48G以上的内存。前者用于基于显卡做模型推理，效果比较好，速度生成合理。后者基于CPU推理，速度较慢，仅可用于部署测试。如果基于显卡部署，需要单独安装CUDA11.8同时需要安装nvidia-docker2套件用于docker上的gpu支持，这里不再赘述安装过程。

首先我们需要下载词嵌入模型，这里推荐使用text2vec-large-chinese这个模型，该模型针对中文文本进行过微调。效果较好.

下载地址如下：https://huggingface.co/GanymedeNil/text2vec-large-chinese/tree/main 。

我们需要下载它的pytorch_model.bin、config.json、vocab.txt这三个文件用于构建我们的词嵌入服务。

接着我们在下载好的文件夹里，新建一个web.py。输入以下内容:

                          from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
from transformers import AutoTokenizer, AutoModel
import torch

app = FastAPI()

# Load the model and tokenizer
model = AutoModel.from_pretrained("/app").half().cuda()
tokenizer = AutoTokenizer.from_pretrained("/app")


# Request body
class Sentence(BaseModel):
    sentence: str


@app.post("/embed")
async def embed(sentence: Sentence):
    # Tokenize the sentence and get the input tensors
    inputs = tokenizer(sentence.sentence, return_tensors='pt', padding=True, truncation=True, max_length=512)

    # Move inputs to GPU
    for key in inputs.keys():
        inputs[key] = inputs[key].to('cuda')

    # Run the model
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the embeddings
    embeddings = outputs.last_hidden_state[0].cpu().numpy()

    # Return the embeddings as a JSON response
    return embeddings.tolist()

以上是基于gpu版本的api。如果你没有gpu支持，那么可以使用以下代码:

                          from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
from transformers import AutoTokenizer, AutoModel
import torch

app = FastAPI()

# Load the model and tokenizer
model = AutoModel.from_pretrained("/app").half()
tokenizer = AutoTokenizer.from_pretrained("/app")

# Request body
class Sentence(BaseModel):
    sentence: str

@app.post("/embed")
async def embed(sentence: Sentence):
    # Tokenize the sentence and get the input tensors
    inputs = tokenizer(sentence.sentence, return_tensors='pt', padding=True, truncation=True, max_length=512)

    # No need to move inputs to GPU as we are using CPU

    # Run the model
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the embeddings
    embeddings = outputs.last_hidden_state[0].cpu().numpy()

    # Return the embeddings as a JSON response
    return embeddings.tolist()

这里我们使用一个简单的pyhont web框架fastapi对外提供服务。接着我们将之前下载的模型和py代码放在一起，并且创建一个 requirements.txt用于构建镜像时下载依赖， requirements.txt包含。

                          torch
transformers
fastapi
uvicorn

其中前两个是模型需要使用的库/框架，后两个是web服务需要的库框架，接着我们在编写一个Dockerfile用于构建镜像:

                          FROM python:3.8-slim-buster

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt

# Run app.py when the container launches
ENV MODULE_NAME=web 
                          

                          ENV VARIABLE_NAME=app 
                          

                          ENV HOST=0.0.0.0 
                          

                          ENV PORT=80
                          

                          

                          # Run the application: 
                          

                          CMD uvicorn ${MODULE_NAME}:${VARIABLE_NAME} --host ${HOST} --port ${PORT}

接着我们就可以基于以上内容构建镜像了。直接执行docker build . -t myembed:latest等待编译即可。

镜像编译完毕后，我们可以在本机运行它：docker run -dit --gpus all -p 8080:80 myembed:latest。注意如果你是cpu环境则不需要添加“--gpus all”。接着我们可以通过postman模拟访问接口，看是否可以生成向量，如果一切顺利，它将生成一个嵌套的多维数组，如下所示：。

接着我们需要同样的办法去炮制语言大模型的接口，这里我们采用国内相对成熟的开源大语言模型Chat-glm-6b。首先我们新建一个文件夹，然后用git拉取它的web服务相关的代码

                          git clone https://github.com/THUDM/ChatGLM-6B.git

接着我们需要下载它的模型权重文件,地址：https://huggingface.co/THUDM/chatglm-6b/tree/main。下载从pytorch_model-00001-of-00008.bin到pytorch_model-00008-of-00008.bin的8个权重文件放在git根目录。

接着我们修改api.py的代码:

                          from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from transformers import AutoTokenizer, AutoModel
import uvicorn, json, datetime
import torch
import asyncio

DEVICE = "cuda"
DEVICE_ID = "0"
CUDA_DEVICE = f"{DEVICE}:{DEVICE_ID}" if DEVICE_ID else DEVICE


def torch_gc():
    if torch.cuda.is_available():
        with torch.cuda.device(CUDA_DEVICE):
            torch.cuda.empty_cache()
            torch.cuda.ipc_collect()

app = FastAPI()

@app.post("/chat", response_class=StreamingResponse)
async def create_item(request: Request):
    global model, tokenizer
    json_post_raw = await request.json()
    json_post = json.dumps(json_post_raw)
    json_post_list = json.loads(json_post)
    prompt = json_post_list.get('prompt')
    history = json_post_list.get('history')
    max_length = json_post_list.get('max_length')
    top_p = json_post_list.get('top_p')
    temperature = json_post_list.get('temperature')
    
    last_response = ''
    async def stream_chat():
        nonlocal last_response,history
        for response, history in model.stream_chat(tokenizer,
                                                prompt,
                                                history=history,
                                                max_length=max_length if max_length else 2048,
                                                top_p=top_p if top_p else 0.7,
                                                temperature=temperature if temperature else 0.95):
            new_part = response[len(last_response):]
            last_response = response
            yield json.dumps(new_part,ensure_ascii=False)
            
    return StreamingResponse(stream_chat(), media_type="text/plain")


if __name__ == '__main__':
    tokenizer = AutoTokenizer.from_pretrained("/app", trust_remote_code=True)
    model = AutoModel.from_pretrained("/app", trust_remote_code=True).half().cuda()
    model.eval()
    uvicorn.run(app, host='0.0.0.0', port=80, workers=1)

同样的如果你是cpu版本的环境，你需要将（这里注意，如果你有显卡，但是显存并不足16G。那么可以考虑8bit或者4bit量化，具体参阅https://github.com/THUDM/ChatGLM-6B的readme.md）。

                          model = AutoModel.from_pretrained("/app", trust_remote_code=True).half().cuda()

修改为。

                          model = AutoModel.from_pretrained("/app", trust_remote_code=True)

剩余的流程和之前部署向量模型类似，由于项目中已经包含了，创建对应的 requirements.txt，我们只需要创建类似词嵌入向量的Dockerfile即可编译.

                          FROM python:3.8-slim-buster
WORKDIR /app
ADD . /app
RUN pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
CMD ["python", "api.py"]

完成后可以使用docker run -dit --gpus all -p 8081:80 myllm:latest启动测试,同样的使用postman模拟访问接口,顺利的话我们应该能够看到如下内容不要在意乱码的部分那是emoji没有正确解析的问题:

。

接下来我们需要构建c#后端代码，将这些基础服务连接起来，这里我使用一个本地静态字典来模拟词嵌入向量的存储和余弦相似度查询相似文本，就不再赘述使用es做向量库，两者的效果基本一致的。感兴趣的同学去搜索NEST库和es基于余弦相似度搜索相关的内容即可。

核心代码如下，这里我提供两个接口，第一个接口用于获取前端输入的文本做词嵌入并进行存储，第二个接口用于回答问题.

                          ///用于模拟向量库    
private Dictionary<string, List<double>> MemoryList = new Dictionary<string, List<double>>();
///用于计算相似度
double Compute(List<double> vector1, List<double> vector2) => vector1.Zip(vector2, (a, b) => a * b).Sum() / (Math.Sqrt(vector1.Sum(a => a * a)) * Math.Sqrt(vector2.Sum(b => b * b)));
...
    [HttpPost("/api/save")]
    public async Task<int> SaveMemory(string str)
    {
        if (!string.IsNullOrEmpty(str))
        {
            foreach (var x in memory.Split("\n").ToList())
            {
                if (!MemoryList.ContainsKey(x))
                {
                    MemoryList.Add(x, await GetEmbeding(x));
                    StateHasChanged();
                }
            }
        }
        return MemoryList.Count; 
    }
...
    [HttpPost("/api/chat")]
    public async IAsyncEnumerable<string> SendData(string content)
    {
        if (!string.IsNullOrEmpty(content))
        {
            var userquestionEmbeding = await GetEmbeding(content);
            var prompt = "";
            if (MemoryList.Any())
            {  //这里从向量库中获取到第一条，你可以根据实际情况设置比如相似度阈值或者返回多条等等
                prompt = MemoryList.OrderByDescending(x => Compute(userquestionEmbeding, x.Value)).FirstOrDefault().Key;
                prompt = $"你是一个问答小助手，你需要基于以下事实依据回答问题，事实依据如下：{prompt}。用户的问题如下：{Content}。不要编造事实依据，请回答：";
            }
            else
                prompt = Content;
            await foreach (var item in ChatStream(prompt))
            {
                yield return item;
            }
        }
    }

同时我们需要提供两个函数用于使用httpclient访问AI模型的api:

                          async IAsyncEnumerable<string> ChatStream(string x)
    {
        HttpClient hc = new HttpClient();
        var reqcontent = new StringContent(System.Text.Json.JsonSerializer.Serialize(new { prompt = x }));
        reqcontent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
        var response = await hc.PostAsync("http://192.168.1.100:8081/chat", reqcontent);
        if (response.IsSuccessStatusCode)
        {
            var responseStream = await response.Content.ReadAsStreamAsync();
            using (var reader = new StreamReader(responseStream, Encoding.UTF8))
            {
                string line;
                while ((line = await reader.ReadLineAsync()) != null)
                {
                    yield return line;
                }
            }
        }
    }
    async Task<List<double>> GetEmbeding(string x)
    {
        HttpClient hc = new HttpClient();
        var reqcontent = new StringContent(System.Text.Json.JsonSerializer.Serialize(new { sentence = x }));
        reqcontent.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
        var result = await hc.PostAsync("http://192.168.1.100:8080/embed", reqcontent);
        var content = await result.Content.ReadAsStringAsync();
        var embed = System.Text.Json.JsonSerializer.Deserialize<List<List<double>>>(content);
        var embedresult = new List<double>();
        for (var i = 0; i < 1024; i++)
        {
            double sum = 0;
            foreach (List<double> sublist in embed)
            {
                sum += (sublist[i]);
            }
            embedresult.Add(sum / 1024);
        }
        return embedresult;
    }

接下来我们可以测试一下效果，当模型没有引入记忆的情况下，询问一个问题，它会自己编造回答:

接着我们在向量库中添加多条记忆后再进行问询，模型即可基本正确的对内容进行回答.

。

以上就是本次博客的全部内容，相比上一个章节我们使用基于openai的接口来讲基于本地部署应该更符合大多数人的情况，以上。

最后此篇关于C#使用词嵌入向量与向量数据库为大语言模型(LLM)赋能长期记忆实现私域问答机器人落地之openai接口平替的文章就讲到这里了,如果你想了解更多关于C#使用词嵌入向量与向量数据库为大语言模型(LLM)赋能长期记忆实现私域问答机器人落地之openai接口平替的内容请搜索CFSDN的文章或继续浏览相关文章，希望大家以后支持我的博客！。

文章推荐：万字长文详述ClickHouse在京喜达实时数据的探索与实践

文章推荐： Netty实战（三）

文章推荐：非极大值抑制（NMS）算法详解

文章推荐：

提升Python编程效率：模块与包全面解读

grails - 为什么IP(域)地址重定向到localhost而不是Grails中的IP(域)
这是我的本地域名 http://10.10.1.101/uxsurvey/profile/dashboard 在 Controller 中，我为用户列表设置了一个操作 redirect(control
dns - 规范 URL 的 www 域 IP 地址和非 www 域 IP 地址
要处理 Canonical URL，最佳做法是执行 301 重定向还是更好地为 www 和非 www 域使用相同的 IP 地址？例如: 想要的规范 URL/域是 http://example.com
内网之工作组、域分析
1 内网基础内网/局域网（Local Area Network，LAN），是指在某一区域内有多台计算机互联而成的计算机组，组网范围通常在数千米以内。在局域网中，可以实现文件管理、应用软件共享、打印机
内网之工作组、域分析
1 内网基础内网/局域网（Local Area Network，LAN），是指在某一区域内有多台计算机互联而成的计算机组，组网范围通常在数千米以内。在局域网中，可以实现文件管理、应用软件共享、打印机
用于物理上分离的托管服务器的 Weblogic 域
我想创建一个 weblogic 集群，其中有两个托管服务器，每个服务器在物理上独立的远程计算机上运行根据weblogic文档 All Managed Servers in a cluster mus
Grails 域 - 多个多对多关系
我正在运行 grails 3.1.4，但在创建允许我将多个域对象绑定(bind)到其他几个域对象的模式时遇到了问题。作为我正在尝试做的一个例子: 我有三个类(class)。书籍、作者和阅读列表。作者
ios - 域@count查询问题
我试图使用@count函数来根据它获取数据，但是在没有崩溃报告的情况下它以某种方式崩溃了。这是代码 class PSMedia: Object { @objc dynamic var id
PostgreSQL 域 : no numbers
有谁知道是否有办法只输入字母字符而不输入数字？我想过这样的事情 CREATE DOMAIN countryDomain AS VARCHAR(100) CHECK( VALUE ??? );
具有子字典匹配的 JavaScript 域
我的代码: const checkoutUrl = 'https://example.com/checkout/*' window.onload = startup() function st
PHP setcookie 域
一些不是我编写的应用程序，也不是用 PHP 编写的，它为域 www.example.com 创建了一个 cookie。我正在尝试替换该 cookie。所以在 PHP 中我做到了: setcookie
oauth - 什么是 oauth 域
什么是 oauth 域？是否有任何免费的 oauth 服务？我可以将它用于 StackApps registration 吗？？我在谷歌上搜索了很多，但找不到答案。最佳答案这是redirect_
regex - 电子邮件正则表达式将如何处理新的 unicode 域？
自从 In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the cre
apache - 更改 Cookie 域
我使用 apache 作为我的应用程序 Web 服务器的代理，并希望即时更改与 sessionid cookie 关联的域名。该cookie有一个与之关联的.company.com域，我想使用apa
cloudflare - 是否可以仅在cloudflare上托 pipe 域
我只想托管一个子域到cloudflare。我不想将主域名的域名服务器更改为他们的域名服务器。真的有可能吗？最佳答案是的，这是可能的，但是需要通过CloudFlare合作伙伴进行设置，或者您需要采用
unix - AF_UNIX 域 - 为什么只使用本地文件名？
When using socket in the UNIX domain, it is advisable to use path name for the directory directory m
grails - 如何实现 "remote"域？
想象两个共享一个域类的 Grails 应用程序。也许是 Book 域类。一个应用程序被标识为数据的所有者，一个应用程序必须访问域数据。类似于亚马逊和亚马逊网络服务。我想拥有的应用程序将使用普通的域
JavaScript 正则表达式 - 域 URL
我有一个包含字段“URL”的表单。第一部分需要用户在文本框中填写。第二部分是预定义的，显示在文本框的右侧。例如，用户在文本框中输入“test”。第二部分预定义为“.example.com”。因此，总
Azure 域 Controller 关闭释放
如果我要关闭并取消分配 azure 中的域 Controller ，从而生成新的 vm Generationid，我需要采取哪些步骤来恢复它？最佳答案 what steps do I need to
azure - 更改免费试用帐户上的 Azure 域
我想尝试使用 Azure 作为托管提供商(我有一个域)。我读过那篇文章https://learn.microsoft.com/en-us/azure/app-service-web/web-sites
windows - 从Docker容器访问Windows文件共享(域)内的伪像
所以.... 我想知道是否有人可以在这方面协助我？基本上，我已经创建了一个自托管的Docker容器，用作构建代理(Azure DevOps) 现在，我已经开始测试代理，并且由于我们的放置文件夹位于W

我是一只小鸟

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

C#使用词嵌入向量与向量数据库为大语言模型(LLM)赋能长期记忆实现私域问答机器人落地之openai接口平替