gpt4 book ai didi

python - 如何在文本分割器(langchain)之后将代码分配给文件?

转载 作者:行者123 更新时间:2023-12-02 22:47:45 27 4
gpt4 key购买 nike

我正在使用 Langchain 的 RecursiveCharacterTextSplitter 来分割 python 文件。这样做我会丢失哪个 block 属于哪个文件的信息。之后如何跟踪各个 block 并将其分配给文件名?

def index_repo(repo_url):

os.environ['OPENAI_API_KEY'] = ""

contents = []
fileextensions = [
".py", ]


print('cloning repo')
repo_dir = get_repo(repo_url)

print(repo_dir)

for dirpath, dirnames, filenames in os.walk(repo_dir):
for file in filenames:
if file.endswith(tuple(fileextensions)):
try:
with open(os.path.join(dirpath, file), "r", encoding="utf-8") as f:
contents.append(f.read())

except Exception as e:
pass


# chunk the files
text_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON, chunk_size=5000, chunk_overlap=0)
texts = text_splitter.create_documents(contents)

return texts

最佳答案

create_documents(texts: List[str], metadatas: Optional[List[dict]] = None) → List[Document]

在元数据中添加文件信息并将其传递给create_documents。

关于python - 如何在文本分割器(langchain)之后将代码分配给文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77012240/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com