gpt4 book ai didi

r - 超正方体 "Error in pixCreateNoInit: pix_malloc fail for data"

转载 作者:行者123 更新时间:2023-12-05 06:37:48 26 4
gpt4 key购买 nike

尝试在松散地基于 this 的函数中运行此函数,但是,由于 xPDF 可以将 PDF 转换为 PNG,我跳过了 ImageMagick 转换步骤,以及 function(i) 过程的错误逻辑,因为 pdftopng 需要一个根名称,在这个文件中是“ocrbook-000001.png”大小写并在查找原始 PDF 文件名的 PNG 时抛出错误。

我现在的问题是让 Tesseract 对我的 PNG 文件做任何事情。我收到错误:

Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Error in pixCreateNoInit: pix_malloc fail for data
Error in pixCreate: pixd not made
Error in pixReadStreamPng: pix not made
Error in pixReadStream: png: no pix returned
Error in pixRead: pix not read
Error during processing.

这是我的代码:

lapply(myfiles, function(i){

shell(shQuote(paste0("pdftopng -f 1 -l 10 -r 600 ", i, " ocrbook")))
mypngs <- list.files(path = dest, pattern = "png", full.names = TRUE)
lapply(mypngs, function(z){
shell(shQuote(paste0("tesseract ", z, " out")))
file.remove(paste0(z))
})
})

最佳答案

显然,问题是 DPI 设置得太高,Tesseract 无法处理。将 PDFtoPNG DPI 参数从 600 更改为 150 似乎已解决了该问题。 Tesseract 似乎有一个最大 DPI 来理​​解和知道该做什么。

我还将我的代码从静态命名约定更正为更加动态的模仿文件原始名称的命名约定。

  dest <- "C:\\users\\YOURNAME\\desktop"

files <- tools::file_path_sans_ext(list.files(path = dest, pattern = "pdf", full.names = TRUE))
lapply(files, function(i){
shell(shQuote(paste0("pdftoppm -f 1 -l 10 -r 150 ", i,".pdf", " ",i)))
})


myppms <- tools::file_path_sans_ext(list.files(path = dest, pattern = "ppm", full.names = TRUE))
lapply(myppms, function(y){
shell(shQuote(paste0("magick ", y,".ppm"," ",y,".tif")))
file.remove(paste0(y,".ppm"))
})

mytiffs <- tools::file_path_sans_ext(list.files(path = dest, pattern = "tif", full.names = TRUE))
lapply(mytiffs, function(z){
shell(shQuote(paste0("tesseract ", z,".tif", " ",z)))
file.remove(paste0(z,".tif"))
})

关于r - 超正方体 "Error in pixCreateNoInit: pix_malloc fail for data",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47104245/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com