gpt4 book ai didi

python - 使用 JSONloader 进行 LangChain 对话检索

转载 作者:行者123 更新时间:2023-12-02 22:46:49 26 4
gpt4 key购买 nike

我修改了这个源代码的数据加载器https://github.com/techleadhd/chatgpt-retrieval让 ConversationalRetrievalChain 接受 JSON 数据。

我创建了一个虚拟 JSON 文件,根据 LangChain 文档,它符合文档中描述的 JSON 结构。

{
"reviews": [
{"text": "Great hotel, excellent service and comfortable rooms."},
{"text": "I had a terrible experience at this hotel. The room was dirty and the staff was rude."},
{"text": "Highly recommended! The hotel has a beautiful view and the staff is friendly."},
{"text": "Average hotel. The room was okay, but nothing special."},
{"text": "I absolutely loved my stay at this hotel. The amenities were top-notch."},
{"text": "Disappointing experience. The hotel was overpriced for the quality provided."},
{"text": "The hotel exceeded my expectations. The room was spacious and clean."},
{"text": "Avoid this hotel at all costs! The customer service was horrendous."},
{"text": "Fantastic hotel with a great location. I would definitely stay here again."},
{"text": "Not a bad hotel, but there are better options available in the area."}
]
}

代码是:

import os
import sys

import openai
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain.llms import OpenAI
from langchain.vectorstores import Chroma
from langchain.document_loaders import JSONLoader

os.environ["OPENAI_API_KEY"] = 'YOUR_API_KEY_HERE'

# Enable to save to disk & reuse the model (for repeated queries on the same data)
PERSIST = False

query = None
if len(sys.argv) > 1:
query = sys.argv[1]


if PERSIST and os.path.exists("persist"):
print("Reusing index...\n")
vectorstore = Chroma(persist_directory="persist", embedding_function=OpenAIEmbeddings())
index = VectorStoreIndexWrapper(vectorstore=vectorstore)
else:

loader = JSONLoader("data/review.json", jq_schema=".reviews[]", content_key='text') # Use this line if you only need data.json

if PERSIST:
index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"persist"}).from_loaders([loader])
else:
index = VectorstoreIndexCreator().from_loaders([loader])

chain = ConversationalRetrievalChain.from_llm(
llm=ChatOpenAI(model="gpt-3.5-turbo"),
retriever=index.vectorstore.as_retriever()
)

chat_history = []
while True:
if not query:
query = input("Prompt: ")
if query in ['quit', 'q', 'exit']:
sys.exit()
result = chain({"question": query, "chat_history": chat_history})
print(result['answer'])

chat_history.append((query, result['answer']))
query = None

一些结果示例是:

Prompt: can you summarize the data?
Sure! Based on the provided feedback, we have a mix of opinions about the hotels. One person found it to be an average hotel with nothing special, another person had a great experience with excellent service and comfortable rooms, another person was pleasantly surprised by a hotel that exceeded their expectations with spacious and clean rooms, and finally, someone had a disappointing experience with an overpriced hotel that didn't meet their expectations in terms of quality.

Prompt: how many feedbacks present in the data ?
There are four feedbacks present in the data.

Prompt: how many of them are positive (sentiment)?
There are four positive feedbacks present in the data.

Prompt: how many of them are negative?
There are three negative feedbacks present in the data.

Prompt: how many of them are neutral?
Two of the feedbacks are neutral.

Prompt: what is the last review you can see?
The most recent review I can see is: "The hotel exceeded my expectations. The room was spacious and clean."

Prompt: what is the first review you can see?
The first review I can see is "Highly recommended! The hotel has a beautiful view and the staff is friendly."

Prompt: how many total texts are in the JSON file?
I don't know the answer.

我可以用我的数据聊天,但除了第一个答案之外,所有其他答案都是错误的。

JSONloader 或 jq_scheme 是否有问题?如何调整代码以便生成预期的输出?

最佳答案

ConversationalRetrievalChain 中,搜索设置为默认 4,请参阅 ../langchain/chains/conversational_retrieval/base.py 中的 top_k_docs_for_context: int = 4enter image description here

这是有道理的,因为您不想将所有向量发送到 LLM 模型(也有相关成本)。根据用例,您可以使用以下命令将默认值更改为更易于管理:

chain = ConversationalRetrievalChain.from_llm(
llm=ChatOpenAI(model="gpt-3.5-turbo"),
retriever=index.vectorstore.as_retriever(search_kwargs={"k": 10})
)

通过此更改,您将得到结果

{'question': 'how many feedbacks present in the data ?',
'chat_history': [],
'answer': 'There are 10 pieces of feedback present in the data.'}

关于python - 使用 JSONloader 进行 LangChain 对话检索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76670856/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com