gpt4 book ai didi

python - 在本地使用 Textract 进行 OCR

转载 作者:行者123 更新时间:2023-12-03 19:04:23 24 4
gpt4 key购买 nike

我想使用 Python 从图像中提取文本。 (Tessaract lib 对我不起作用,因为它需要安装)。
我找到了 boto3 lib 和 Textract,但是我在使用它时遇到了麻烦。我还是新手。你能告诉我我需要做什么才能正确运行我的脚本吗?
这是我的代码:

import cv2
import boto3
import textract


#img = cv2.imread('slika2.jpg') #this is jpg file
with open('slika2.pdf', 'rb') as document:
img = bytearray(document.read())

textract = boto3.client('textract',region_name='us-west-2')

response = textract.detect_document_text(Document={'Bytes': img}). #gives me error
print(response)
当我运行此代码时,我得到:
botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
我也试过这个:
# Document
documentName = "slika2.jpg"

# Read document content
with open(documentName, 'rb') as document:
imageBytes = bytearray(document.read())

# Amazon Textract client
textract = boto3.client('textract',region_name='us-west-2')

# Call Amazon Textract
response = textract.detect_document_text(Document={'Bytes': imageBytes}) #ERROR

#print(response)

# Print detected text
for item in response["Blocks"]:
if item["BlockType"] == "LINE":
print ('\033[94m' + item["Text"] + '\033[0m')
但我收到此错误:
botocore.exceptions.ClientError: An error occurred (InvalidSignatureException) when calling the DetectDocumentText operation: The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
我是菜鸟,所以任何帮助都会很好。如何从我的图像或 pdf 文件中读取文本?
我也添加了这段代码,但错误仍然是 Unable to locate credentials
session = boto3.Session(
aws_access_key_id='xxxxxxxxxxxx',
aws_secret_access_key='yyyyyyyyyyyyyyyyyyyyy'
)

最佳答案

将凭据传递给 boto3 时出现问题。您必须在创建 boto3 客户端时传递凭据。

import boto3

# boto3 client
client = boto3.client(
'textract',
region_name='us-west-2',
aws_access_key_id='xxxxxxx',
aws_secret_access_key='xxxxxxx'
)

# Read image
with open('slika2.png', 'rb') as document:
img = bytearray(document.read())

# Call Amazon Textract
response = client.detect_document_text(
Document={'Bytes': img}
)

# Print detected text
for item in response["Blocks"]:
if item["BlockType"] == "LINE":
print ('\033[94m' + item["Text"] + '\033[0m')
请注意,不建议在代码中硬编码 AWS key 。请引用以下文档
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/configuration.html

关于python - 在本地使用 Textract 进行 OCR,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64045020/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com