gpt4 book ai didi

azure - 将 Azure 计算机视觉读取响应转换为 Azure 认知搜索中相关的 MergeText 技能

转载 作者:行者123 更新时间:2023-12-03 07:00:07 26 4
gpt4 key购买 nike

来自 Azure 计算机视觉 的原始 Read 响应如下所示:

{
"status": "succeeded",
"createdDateTime": "2021-04-08T21:56:17.6819115+00:00",
"lastUpdatedDateTime": "2021-04-08T21:56:18.4161316+00:00",
"analyzeResult": {
"version": "3.2",
"readResults": [
{
"page": 1,
"angle": 0,
"width": 338,
"height": 479,
"unit": "pixel",
"lines": [
{
"boundingBox": [
25,
14
],
"text": "NOTHING",
"appearance": {
"style": {
"name": "other",
"confidence": 0.971
}
},
"words": [
{
"boundingBox": [
27,
15
],
"text": "NOTHING",
"confidence": 0.994
}
]
}
]
}
]
}
}

复制自 here

我想在 Azure 认知搜索 中创建自定义技能,该技能不使用 VisionSkill,而是使用我自己的 Azure Functions,该功能将使用 代码中的计算机视觉客户端。

问题是,将输入传递给 Text.MergeSkill:

{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name":"text",
"source": "/document/content"
},
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/text"
},
{
"name":"offsets",
"source": "/document/normalized_images/*/contentOffset"
}
],
"outputs": [
{
"name": "mergedText",
"targetName" : "merged_text"
}
]
}

我需要将 Read 输出转换为从自定义技能返回 OcrSkill 的形式。该响应必须如下所示:

{
"text": "Hello World. -John",
"layoutText":
{
"language" : "en",
"text" : "Hello World.",
"lines" : [
{
"boundingBox":
[ {"x":10, "y":10}, {"x":50, "y":10}, {"x":50, "y":30},{"x":10, "y":30}],
"text":"Hello World."
},
],
"words": [
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"Hello"
},
{
"boundingBox": [ {"x":110, "y":10}, {"x":150, "y":10}, {"x":150, "y":30},{"x":110, "y":30}],
"text":"World."
}
]
}
}

我从 here 复制了它

我的问题是,如何从 Read Computer Vision endpoint 转换 boundingBox 参数形成 Text.MergeSkill 接受?我们真的需要这样做吗?还是我们可以将 Read 响应以不同的方式传递给 Text.MergeSkill

最佳答案

内置OCRSkill调用某些语言的认知服务计算机视觉读取 API,并通过“文本”输出为您处理文本合并。如果可能的话,我强烈建议您使用这项技能,而不是编写自定义技能。

如果您必须编写自定义技能并自行合并输出文本,请按照the MergeSkill documentation ,“文本”和“偏移”输入是可选的。这意味着,如果您只需要一种将这些输出合并到一个大文本中的方法,您应该能够通过“itemsToInsert”输入直接将文本从各个 Read API 输出对象传递到 MergeSkill。这将使您的技能组看起来像这样(未经测试以确定),假设您仍在使用内置的 AzureSearch 图像提取,并且您的自定义技能输出您上面共享的 Read API 返回的确切负载。

{
"skills": [
{
"@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Custom skill that calls Cognitive Services Computer Vision Read API",
"uri": "<your custom skill uri>",
"batchSize": 1,
"context": "/document/normalized_images/*",
"inputs": [
{
"name": "image",
"source": "/document/normalized_images/*"
}
],
"outputs": [
{
"name": "readAPIOutput"
}
]
},
{
"@odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Create merged_text, which includes all the textual representation of each image inserted at the right location in the content field.",
"context": "/document",
"insertPreTag": "",
"insertPostTag": "\n",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/readAPIOutput/analyzeResult/readResults/*/lines/*/text"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "merged_text"
}
]
}
]
}

但是,如果您需要保证文本根据边界框以正确的顺序显示,您可能需要编写一个自定义解决方案来计算位置并自行重新组合文本。因此,如果可能的话,建议在 OCRSkill 中使用我们的内置解决方案。

关于azure - 将 Azure 计算机视觉读取响应转换为 Azure 认知搜索中相关的 MergeText 技能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72531773/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com