gpt4 book ai didi

python - 使用带有 ocr 的 gdata docs python v3.0 上传 pdf 文件

转载 作者:太空宇宙 更新时间:2023-11-04 01:37:15 25 4
gpt4 key购买 nike

我有以下用于将 pdf 文件上传到 google docs 的实现(取自 gdata API 示例):

def UploadResourceSample():
"""Upload a document, and convert to Google Docs."""
client = CreateClient()
doc = gdata.docs.data.Resource(type='document', title='My Sample Doc')

# This is a convenient MS Word doc that we know exists
path = _GetDataFilePath('test.0.doc')
print 'Selected file at: %s' % path

# Create a MediaSource, pointing to the file
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/msword')

# Pass the MediaSource when creating the new Resource
doc = client.CreateResource(doc, media=media)
print 'Created, and uploaded:', doc.title.text, doc.resource_id.text

现在我想对上传的文件进行OCR文字识别。但我不确定如何在 gdata docs python API 中启用 OCR 识别。所以我的问题是:有没有办法在 pdf 文件上使用 gdata python v3.0 API 启用 OCR 识别?

最佳答案

我已经成功地使用以下代码对我的 pdf 文档进行了 OCR:

def UploadResourceSample(filename, filepath, fullpath):
"""Upload a document, and convert to Google Docs."""
client = CreateClient()
doc = gdata.docs.data.Resource(type='document', title=filename)

path = fullpath
print 'Selected file at: %s' % path

# Create a MediaSource, pointing to the file
media = gdata.data.MediaSource()
media.SetFileHandle(path, 'application/pdf')

# Pass the MediaSource when creating the new Resource
create_uri = gdata.docs.client.RESOURCE_UPLOAD_URI + '?ocr=true&ocr-language=de'
doc = client.CreateResource(doc, create_uri=create_uri, media=media)
print 'Created, and uploaded:', doc.title.text, doc.resource_id.text

关于python - 使用带有 ocr 的 gdata docs python v3.0 上传 pdf 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8689021/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com