Llama 70b on Hugging Face Inference API Endpoint short responses(大羊驼70b关于拥抱面孔推理API终端短响应)-6ren

Llama 70b on Hugging Face Inference API Endpoint short responses(大羊驼70b关于拥抱面孔推理API终端短响应)

转载作者：bug小助手更新时间：2023-10-26 20:15:08

I just deployed the Nous-Hermes-Llama2-70b parameter on a 2x Nvidia A100 GPU through the Hugging Face Inference endpoints.

我刚刚通过拥抱面孔推理端点在2x NVIDIA A100图形处理器上部署了nous-hermes-lama2-70b参数。

When I tried the following code, the response generations were incomplete sentences that were less than 1 line long.

当我尝试以下代码时，响应生成的是不完整的句子，长度不到1行。

import requests

API_URL = 'https://myendpoint.us-east-1.aws.endpoints.huggingface.cloud'
headers = {
  "Authorization": "Bearer mytoken1234",
  "Content-Type": "application/json"
}

def query(payload):
  response = requests.post(API_URL, headers=headers, json=payload)
  return response.json()
 
output = query({
  "inputs": "### Instruction:\r\nCome up with a joke about cats\r\n### Response:\r\n",
})

The output in this case was:

本例中的输出为：

"Why don't cats play poker in the jungle?

 Because "

As you see, the response stopped after 9 words.

如你所见，回复在9个单词后停止。

Do I need to add more headers to the request like temperature and max token length? How would I do that? What do I need to do to get normal, long responses?

我是否需要向请求添加更多标头，如温度和最大令牌长度？我该怎么做呢？我需要做什么才能得到正常的、长时间的回复？

Here is the model I'm using: https://huggingface.co/NousResearch/Nous-Hermes-Llama2-70b

这是我正在使用的模型：https://huggingface.co/NousResearch/Nous-Hermes-Llama2-70b

更多回答

With most generative AI, you can either wait a long time for the full generation to complete and get the full result back (which is atrociously slow if it's a lot of text), or you can stream the results in in realtime as they are completed; in this case I'm not sure if the Hugging Face Inference endpoints need to be treated specially in order to stream the result back in realtime, but given how things are behaving, that certainly seems like the case. I highly recommend looking at other examples, and determining how to check if the API is meant to be streaming the results back or not.

对于大多数产生型人工智能，您可以等待很长时间才能完成整个生成并返回完整的结果(如果文本很多，则会非常慢)，或者您可以在结果完成时实时输入结果；在这种情况下，我不确定是否需要特殊处理拥抱面孔推理端点，以便实时传回结果，但考虑到事情的表现，情况肯定是这样的。我强烈建议查看其他示例，并确定如何检查API是否打算回传结果。

优秀答案推荐

Added "max_new_tokens" => 256 as a parameter, fixed it.

新增“max_new_tokens”=>256作为参数，修复

更多回答

ios - Swift:将 [A, B, B, B, A, B, B, B] 数组转换为哈希数组 [ [A: [B, B, B], [A: [B, B, B] ] ]
我有两种结构，Header 和Session，它们都符合协议(protocol)TimelineItem。我有一个 Array 由 TimelineItem 组成，如下所示: [Header1, S
python - 斐波那契的 `a, b = b, a+b` 和 `a = b; b = a+b` 有什么区别
这个问题在这里已经有了答案: Multiple assignment and evaluation order in Python (11 个答案) 关闭 6 年前。我刚接触python所以想问你
R 问题 A、A、A、A、B、B、B、B、B 的唯一组合数
我试图找到一种方法来在 R 中获取 A、A、A、A、B、B、B、B、B 的所有可能的唯一排列的列表。组合最初被认为是获得解决方案的方法，因此组合的答案。最佳答案我认为这就是你所追求的。 @bil
clojure - 混合两个向量 : [a a] and [b b] to [a b a b]
我怎样才能将两个给定的向量混合成一个新的向量，它以交替的顺序保存它们的值。 (f [a a] [b b]) ; > [a b a b] 这是我想到的: (flatten (map vector [:a
Python a, b = b, a + b
这是我的第一个问题，我开始学习Python。之间有区别吗: a, b = b, a + b 和 a = b b = a + b 当您在下面的示例中编写它时，它会显示不同的结果。 def fib(n):
c++ - A::B::B::B::B...B::f() 对吗？为什么我可以这样做？
这个问题在这里已经有了答案: Why is there an injected class name? (1 个回答) 12 个月前关闭。我不知道如何解释: namespace A { struct
java - 为什么 "a^=b^=a^=b;"与 "a^=b; b^=a; a^=b;"不同？
我尝试了一些代码来交换 Java 中的两个整数，而不使用第三个变量，使用 XOR。这是我尝试过的两个交换函数: package lang.numeric; public class SwapVars
java - B b 和 A b 之间的区别
假设类 B 扩展类 A，并且我想为 B 声明一个变量。什么更有效？为什么？ B b或 A b . 最佳答案您混淆了两个不同的概念。 class B extends A { } 意味着B 是 A .

Python(斐波纳奇数列): trying to understand what is the difference between a, b = b, a + b OR a = b & a = a + b
我不确定这个问题的标题是什么，这也可能是一个重复的问题。所以请相应地指导。我是 python 编程的新手。我有这个简单的代码来生成斐波那契数列。 1: def fibo(n): 2: a =

c++ - 我无法理解 c 风格转换之间的区别(例如 :- A a;B *b;b = (*B) (&a);) and dynamic_cast(&a);
我在谷歌上搜索了有关 dynamic_cast 的内容，我发现显式地将基类对象转换为派生类指针可能是不安全的。但是当我运行一些示例代码来检查它时，我没有收到任何错误。请在下面找到我的代码: class

c++ - "B(int b=0):b(b){}"是什么意思？
这个问题在这里已经有了答案: What is this weird colon-member (" : ") syntax in the constructor? (14 个答案) 关闭 8 年前。

sql - 测试非整数是否在 [a,b) - 或 [a,b], (a,b), (a,b] 范围内
在不重现产生非整数值的表达式的情况下实现以下目标的惯用方法是什么(在我的真实情况下，该值是在我不想重现的冗长查询之后计算为百分比的): SELECT * FROM SomeTable WHERE 1/

javascript - [b,a] =[a,b] 和 const [b,a]=[a,b] 有什么区别
在析构中，这两个代码的结果确实不同。我不确定为什么。提示说 const [b,a] = [a,b] 将导致 a,b 的值为 undefined (从左到右的简单分配规则)。我不明白为什么会这样。 l

c++ - 为什么使用 “b < a ? a : b” 而不是 “a < b ? b : a” 来实现最大模板？
C++ Templates - The Complete Guide, 2nd Edition介绍max模板: template T max (T a, T b) { // if b < a th

Java取模概念——算术定义(a/b)*b+(a%b)
我最近开始学习代码(Java)，并根据第 15.17.3 节在 Oracle 网站上查找了模运算符。以下链接: http://docs.oracle.com/javase/specs/jls/se8/

pointers - 这种指针用法有什么区别(a :=&A; a) and (b:=B; &b)
无法理解以下行为。 d1 := &data{1}; 的区别d1 和 d2 := 数据{1}； &d1。两者都是指针，对吧？但他们的行为不同。这里发生了什么 package main import "f

java - "a <= b && b <= a && a != b"怎么可能是真的？
这个问题在这里已经有了答案: How to make loop infinite with "x = y && x != y"? (4 个回答) How can i define variables

python - python 中的 ['[a,a,a]' ,'[b,b,b]' ] 和 [[a,a,a],[b,b,b]] 有什么区别？
在我的程序中，当我调试我的代码时，它似乎在我生成的代码中的某处 X1=['[a,a,a]','[b,b,b]'] 还有我生成的其他地方 X2=[[a,a,a],[b,b,b]] 当我想添加这两个列表然

c++ - 鉴于 b 始终非零，为什么 `b ? --b :++b` 有效，但 `--b` 无效？
我试图使用递归将两个整数相乘，并意外编写了这段代码: //the original version int multiply(int a, int b) { if ( !b ) retu

python - 如何对 python 说 (a+b) = (b+a) 和 (a*b) = (b*a)
我有一个列表中数字之间所有可能的操作组合: list = ['2','7','8'] 7+8*2 8+7*2 2*8+7 2+8*7 2-8*7 8-2/7 etc 我想知道是否可以说像 ('7*2+

bug小助手

个人简介
我是一名优秀的程序员,十分优秀！

作者热门文章

Python - failed to import external python program under different folder(Python-无法在不同文件夹下导入外部Python程序)

C# commandline parser : produces BadFormatConversionError when a bool option is given without value(C#命令行解析器：当给定一个没有值的bool选项时，会产生BadFormatConversionError)

Get primary table entries where the latest secondary entry was 6 months ago(获取6个月前最新辅助条目所在的主表条目)

Can't change the src of un image in javascript in django(无法更改django中javascript中un-image的src)

滴滴打车优惠券免费领取

全站热门文章

深入解析SpringAI系列：以OpenAI与Moonshot案例为例寻找共同点

探索自联接（SELFJOIN）：揭示数据间复杂关系的强大工具

20250110-FortuneWheel攻击事件：竟然不设滑点，那就体验一下ForceInvestment吧

G1原理—3.G1是如何提升垃圾回收效率

Java实现任务管理器性能网络监控数据

详解：订单履约系统规划

外部H5唤起常用小程序链接规则整理

[.NET]使用客户端缓存提高API性能

WPF怎么利用behavior优雅的给一个Datagrid添加一个全选的功能

主体分割技术，提升图像信息提取能力

首页

博学

6Ren·AI

商城

Llama 70b on Hugging Face Inference API Endpoint short responses(大羊驼70b关于拥抱面孔推理API终端短响应)