Basic OCR

This is a very basic optical character recognition script written in PHP. This is untested and serves merely a proof of concept. As noted in the comments, adjusting the sample size can improve results, since with a large sample size on a small image there can be many collisions. A database is needed to compare output results of this script with known values.

[code lang="PHP"]

/* create a test image */
$im = @imagecreate(100, 20) or die("Cannot Initialize new GD image stream");
$background_color = imagecolorallocate($im, 255, 255, 255);
$text_color = imagecolorallocate($im, 0, 0, 0);
imagestring($im, 1, 5, 5, "Hello, World!", $text_color);

/***
* Assumptions:
* A monochrome image where characters are black
* A single character is connected
* Characters are disjointed by white space
*/

$width = imagesx($im);
$height = imagesy($im);

/***
* Notes:
* The smaller the sample size the more accurate it will be,
* however, it will take longer. Larger images can use a larger
* sample size wihtout compromizing much accuracy.
*/
$x_sample = 1;
$y_sample = 1;

$last = 0;
for($i = 0; $i < $width; $i++) {
$col = array();

for($j = 0; $j < $height; $j++) {
$col[$j] = imagecolorat($im, $i, $j);
}

if(($current = array_sum($col)) > 0) {
if($last == 0) {
$l = 0;
}

for($k = 0; $k < $height; $k++) {
if(($l % $x_sample) == 0) {
if(($k % $y_sample) == 0) {
$sample .= $col[$k];
}
}
}

$l++;
$last = $current;
} else {
$last = 0;
}

if(!empty($sample) && $last == 0) {
echo $sample . "\n";
$sample = "";
}
}
?>[/code]

Output (each line represents a character)

00000011111100000000000000001000000000000000000010000000000000000011111100000000
00000000011000000000000000001011000000000000000011010000000000000000010100000000
000000100001000000000000001111110000000000000000000100000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
000000000001000000000000000001100000000000000000010000000000
00000011111100000000000000000110000000000000000001100000000000000011111100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
00000000111100000000000000000100000000000000000010000000000000000000010000000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010100000000000000011111100000000
00000001110100000000

Related posts:

  1. Basic CAPTCHA Very basic code demonstrating the use of the GD library by creating a simple CAPTCHA. [code lang="PHP"][/code]...
  2. Using PHP and GD to add border to text [code lang="PHP"][/code]...
  3. Binary to Decimal Conversion Converts a binary number to its decimal equivalent. However unlike the bindec function, this will preserve the binary/radix point. Code...
  4. imagettftext only displays yellow text Lately I have been doing some work with the PHP GD library. I wanted to put text on an image...
  5. Basic Logic The modern computer is perhaps one of the most complex and perplexing yet ever so simple creations known. The entire...

Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.


3 Responses to “Basic OCR”

  1. toto

    Thank for sharing. But I confused from your ouput program. How we can get character from your binary number ouput. Can you share that ?
    Thank you

    Reply |

    • John

      You will need to create a training set – basically run this against characters you know, store that binary string in a database., then next time you see that string, you can reference the database and lookup the character.

      Reply |