java - 文本提取和分割打开简历-6ren

java - 文本提取和分割打开简历

转载作者：太空宇宙更新时间：2023-11-04 14:58:45

我以前从未使用过 OpenCV，但我正在尝试编写神经网络系统来识别文本，并且我需要一些用于文本提取/分割的工具。

如何使用java OpenCV对包含文本的图像进行预处理和分割。
我不需要识别文本，我只需要获取单独图像中的每个字母。
类似这样的:
enter image description here

最佳答案

试试这个代码。不需要 OpenCV

import java.awt.image.BufferedImage;
import java.util.ArrayList;
import java.util.List;
import org.neuroph.imgrec.ImageUtilities;


public class CharExtractor {

private int cropTopY = 0;//up locked coordinate
private int cropBottomY = 0;//down locked coordinate
private int cropLeftX = 0;//left locked coordinate
private int cropRightX = 0;//right locked coordinate
private BufferedImage imageWithChars = null;
private boolean endOfImage;//end of picture
private boolean endOfRow;//end of current reading row

/**
 * Creates new char extractor with soecified text image
 * @param imageWithChars - image with text
 */
public CharExtractor(BufferedImage imageWithChars) {
    this.imageWithChars = imageWithChars;
}

public void setImageWithChars(BufferedImage imageWithChars) {
    this.imageWithChars = imageWithChars;
}

/**
 * This method scans image pixels until it finds the first black pixel (TODO: use         foreground color which is black by default).
 * When it finds black pixel, it sets cropTopY and returns true. if it reaches end of image and does not find black pixels, 
 * it sets endOfImage flag and returns false.
 * @return - returns true when black pixel is found and cropTopY value is changed, and false if cropTopY value is not changed
 */
 private boolean findCropTopY() {
    for (int y = cropBottomY; y < imageWithChars.getHeight(); y++) { // why cropYDown? -   for multiple lines of text using cropBottomY from previous line above; for first line its zero
        for (int x = cropLeftX; x < imageWithChars.getWidth(); x++) { // scan starting from the previous left crop position - or it shoud be right???
            if (imageWithChars.getRGB(x, y) == -16777216) { // if its black rixel (also consider condition close to black or not white or different from background)
                this.cropTopY = y;   // save the current y coordiante
                return true;        // and return true
            }
        }
    }
    endOfImage = true;  //sets this flag if no black pixels are found
    return false;       // and return false
}

/**
 * This method scans image pixels until it finds first row with white pixels. (TODO: background color which is white by default).
 * When it finds line whith all white pixels, it sets cropBottomY and returns true
 * @return - returns true when cropBottomY value is set, false otherwise
 */
private boolean findCropBottomY() {
    for (int y = cropTopY + 1; y < imageWithChars.getHeight(); y++) { // scan image from  top to bottom           
        int whitePixCounter = 0; //counter of white pixels in a row
        for (int x = cropLeftX; x < imageWithChars.getWidth(); x++) { // scan all pixels to right starting from left crop position
            if (imageWithChars.getRGB(x, y) == -1) {    // if its white pixel
                whitePixCounter++;                      // increase counter
            }
        }
        if (whitePixCounter == imageWithChars.getWidth()-1) { // if we have reached end of line counting white pixels (x pos)
            cropBottomY = y;// that means that we've found white line, so set current y coordinate minus 1
            return true; // as cropBottomY and finnish with true
        }
        if (y == imageWithChars.getHeight() - 1) {  // if we have reached end of image 
            cropBottomY = y;                        // set crop bottom
            endOfImage = true;                      // set corresponding endOfImage flag
            return true;                            // and return true
        }
    }
    return false;                                   // this should never happen, however its possible if image has non white bg
}

private boolean findCropLeftX() {        
    int whitePixCounter = 0;                                            // white pixel counter between the letters
    for (int x = cropRightX; x < imageWithChars.getWidth(); x++) {      // start from previous righ crop position (previous letter), and scan following pixels to the right
        for (int y = cropTopY; y <= cropBottomY; y++) {             // vertical pixel scan at current x coordinate
            if (imageWithChars.getRGB(x, y) == -16777216) {             // when we find black pixel
                cropLeftX = x;                                          // set cropLeftX
                return true;                                            // and return true
            }
        }

        // BUG?: this condition looks strange.... we might not need whitePixCounter at all, it might be used for 'I' letter
        whitePixCounter++;                                              // if its not black pixel assume that its white pixel
        if (whitePixCounter == 3) {                                     // why 3 pixels? its hard coded for some case and does not work in general...!!!
            whitePixCounter = 0;                                        // why does it sets to zero, this has no purporse at all...
        }
    }
    endOfRow = true;        // if we have reached end of row and we have not found black pixels, set the endOfRow flag
    return false;           // and return false
}

/**
 * This method scans image pixels to the right until it finds next row where all pixel are white, y1 and y2.
 * @return - return true  when x2 value is changed and false when x2 value is not changed
 */
private boolean findCropRightX() {
    for (int x = cropLeftX + 1; x < imageWithChars.getWidth(); x++) {   // start from current cropLeftX position and scan pixels to the right
        int whitePixCounter = 0;
        for (int y = cropTopY; y <= cropBottomY; y++) {             // vertical pixel scan at current x coordinate
            if (imageWithChars.getRGB(x, y) == -1) {                    // if we have white pixel at current (x, y)
                whitePixCounter++;                                      // increase whitePixCounter
            }
        }

        // this is for space!
        int heightPixels = cropBottomY - cropTopY;                      // calculate crop height
        if (whitePixCounter == heightPixels+1) {                         // if white pixel count is equal to crop height+1  then this is white vertical line, means end of current char/ (+1 is for case when there is only 1 pixel; a 'W' bug fix)
            cropRightX = x;                                             // so set cropRightX    
            return true;                                                // and return true
        }

        // why we need this when we allready have condiiton in the for loop? - for the last letter in the row.
        if (x == imageWithChars.getWidth() - 1) {                       // if we have reached end of row with x position    
            cropRightX = x;                                             // set cropRightX
            endOfRow = true;                                            // set endOfRow flag
            return true;                                                // and return true
        }
    }       
}

public List<BufferedImage> extractCharImagesToRecognize() {
    List<BufferedImage> trimedImages = new ArrayList<BufferedImage>();
    int i = 0;

    while (endOfImage == false) {
        endOfRow = false;
        boolean foundTop = findCropTopY();
        boolean foundBottom = false;
        if (foundTop == true) {
            foundBottom = findCropBottomY();
            if (foundBottom == true) {
                while (endOfRow == false) {
                    boolean foundLeft = false;
                    boolean foundRight = false;
                    foundLeft = findCropLeftX();
                    if (foundLeft == true) {
                        foundRight = findCropRightX();
                        if (foundRight == true) {
                            BufferedImage image = ImageUtilities.trimImage(ImageUtilities.cropImage(imageWithChars, cropLeftX, cropTopY, cropRightX, cropBottomY));
                            trimedImages.add(image);                                
                            i++;                                                                               
                        }
                    }
                }
                cropLeftX = 0;
                cropRightX = 0;
            }
        }
    }
    cropTopY = 0;
    cropBottomY = 0;
    endOfImage = false;

    return trimedImages;
}


public static void main(String[] args) throws Exception {
    File f=new File("./written.png");
    BufferedImage img=ImageIO.read(f);
    CharExtractor ch=new CharExtractor(img);
    List<BufferedImage> list=ch.extractCharImagesToRecognize();

    for(int i=0;i<list.size();i++)
    {
         File outputfile = new File("./char_" +i+ ".png");
         ImageIO.write(list.get(i),"png", outputfile);
    }        
}
}

关于java - 文本提取和分割打开简历，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22879005/

文章推荐： javascript - 是否可以在没有 CSS3 的情况下旋转 HTML 对象？

文章推荐： javascript - 如何从 jquery 中的表数据中获取 span id

文章推荐： html - 如何在不影响宽度的情况下更改textarea字体大小

文章推荐： c++ - 使用内联汇编获取调用我的函数的函数的地址

postgresql - 组内级联的Postgres交叉表(文本，文本)
表架构 DROP TABLE bla; CREATE TABLE bla (id INTEGER, city INTEGER, year_ INTEGER, month_ INTEGER, val I
javascript - 按一定顺序分割字符串。例如文本/0000/文本/文本
我需要拆分字符串/或从具有以下结构的字符串中获取更容易的子字符串。字符串将来自 window.location.pathname 或 window.location.href，看起来像 text/n
ios - 将对象添加到数组时更新 textView 文本，而不覆盖前一个对象的 textView 文本
每当将对象添加到数组中时，我都会尝试更新 TextView ，并在 TextView 中显示该文本，如下所示: "object 1" "object 2" 问题是，每次将新对象添加到数组时，它都会覆盖
java - Html 2 文本 - 删除 "hidden"文本
我目前正在寻找使用 Java 读取网站可见文本并将其存储为纯文本字符串的方法。换句话说，我想转换成这样: Hello stupid World进入“ Hello World ” 或者类似的东西 Un
php - Pear Mail，如何以UTF-8发送纯文本/文本+文本/html
我正在尝试以文本和 HTML 格式发送电子邮件，但无法正确发送正确的 header 。特别是，我想设置 Content-Type header ，但我找不到如何为 html 和文本部分单独设置它。这
c# - 从资源 wpf 绑定(bind)文本 block 文本
我尝试了上面的代码，但我无法绑定(bind)文本，我怎样才能将资源内部文本 bloc
unity3d - Unity 网络播放器因 UI 文本(新 Canvas 文本)而崩溃
我刚刚完成了 Space Shooter 教程，由于没有 GUIText 对象，所以我创建了 UI.Text 对象并进行了相应的编码。它在统一播放器中有效，但在构建 Web 应用程序后无效。我花了一段
ios - 为什么 UITextField 文本 setter 无法识别 [UIView 文本] 选择器
我有这个代码: - (IBAction)setButtonPressed:(id)sender { NSUserDefaults *sharedDefaults = [[NSUserDefau
java - 在 JLabel 图标上添加 JLabel 文本。使用相同的 JLabel 文本
抱歉标题含糊不清，但我想不出我想在标题中做什么。无论如何，对于图像上的文本，我使用了 JLabel 文本并将其添加到图标中。 JLabel icon = new JLabel(new Imag
javascript - "The stylesheet was not loaded because its MIME type, "文本/html "is not "文本/css"
关闭。这个问题是not reproducible or was caused by typos .它目前不接受答案。这个问题是由于错别字或无法再重现的问题引起的。虽然类似的问题可能是on-topi
html - 是否可以使用 CSS 定位 HTML(文本)？ - 它显示为(文本)作为 ID
我在将 Twitter 嵌入到我从 HTML 5 转换的 wordpress 运行网站时遇到问题。我遇到的问题是推文不是我的自定义字体... 这是我无法使用任何 css 定位的 HTML 代码，我正
java - 将 logger.debug ("message: "+ 文本)转换为 logger.debug(消息 : {}", 文本)
我正在尝试找到解决由于使用以下形式的代码而导致的冗余字符串连接问题的最佳方法: logger.debug("Entering loop, arg is: " + arg) // @1 在大多数情况下，
java分组正则表达式无法匹配字符串+文本
我写了这个测试 @Test public void removeRequestTextFromRouteError() throws Exception { String input = "F
java正则表达式匹配&[文本]
我目前正在创建一个正则表达式来拆分所有匹配以下格式的字符串:&[文本]，并且需要获取文本。字符串可能类似于:something &[text] &[text] everything &[text] 等
CSS变形词/文本
有没有办法将标题文本从一个词变形为另一个词，同时保留两个词中使用的字母？我看过的许多 css 文本动画大多是视觉的，很少有旋转整个单词的。我想要做的是从一个词过渡，例如“BEACH”到“CHANGE
学习python中matplotlib绘图设置坐标轴刻度、文本
总结matplotlib绘图如何设置坐标轴刻度大小和刻度。上代码： ?
容器内的 Flutter 文本
我在容器 (1) 中创建了容器 (2)。你能帮忙如何向容器(1)添加文本吗？下面是我的代码 return Scaffold( body: Padding( padding: c
具有渐变和渐变轮廓的 CSS 文本
我似乎找不到任何人或任何人这样做过。我试图限制我们使用的图像数量，并想创建一个带有渐变作为其“颜色”的文本，并在其周围设置渐变轮廓/描边到目前为止，我还没有看到任何将两者结合在一起的东西。我可以自
从视频游戏截图中提取 Python 文本
我正在为视频游戏暗黑破坏神 2 使用 discord.py 构建一个不和谐机器人。其中一项功能要求机器人从暗黑破坏神 2 屏幕截图中提取项目的名称和属性。我目前正在为此使用 pytesseract，但
在ggplot2中旋转 strip 文本
我很难弄清楚如何旋转 strip.text theme 中的属性来自 ggplot2 .我使用的是 R 版本 3.4.2 和 ggplot2 版本 2.2.1。以下是 MWE 的数据。 > dput

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 文本提取和分割打开简历