This is a very basic optical character recognition script written in PHP. This is untested and serves merely a proof of concept. As noted in the comments, adjusting the sample size can improve results, since with a large sample size on a small image there can be many collisions. A database is needed to compare output results of this script with known values.
/* create a test image */ $im = @imagecreate(100, 20) or die("Cannot Initialize new GD image stream"); $background_color = imagecolorallocate($im, 255, 255, 255); $text_color = imagecolorallocate($im, 0, 0, 0); imagestring($im, 1, 5, 5, "Hello, World!", $text_color); /*** * Assumptions: * A monochrome image where characters are black * A single character is connected * Characters are disjointed by white space */ $width = imagesx($im); $height = imagesy($im); /*** * Notes: * The smaller the sample size the more accurate it will be, * however, it will take longer. Larger images can use a larger * sample size wihtout compromizing much accuracy. */ $x_sample = 1; $y_sample = 1; $last = 0; for($i = 0; $i < $width; $i++) { $col = array(); for($j = 0; $j < $height; $j++) { $col[$j] = imagecolorat($im, $i, $j); } if(($current = array_sum($col)) > 0) { if($last == 0) { $l = 0; } for($k = 0; $k < $height; $k++) { if(($l % $x_sample) == 0) { if(($k % $y_sample) == 0) { $sample .= $col[$k]; } } } $l++; $last = $current; } else { $last = 0; } if(!empty($sample) && $last == 0) { echo $sample . "\n"; $sample = ""; } }
Output (each line represents a character)
00000011111100000000000000001000000000000000000010000000000000000011111100000000
00000000011000000000000000001011000000000000000011010000000000000000010100000000
000000100001000000000000001111110000000000000000000100000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
000000000001000000000000000001100000000000000000010000000000
00000011111100000000000000000110000000000000000001100000000000000011111100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
00000000111100000000000000000100000000000000000010000000000000000000010000000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010100000000000000011111100000000
00000001110100000000
00000000011000000000000000001011000000000000000011010000000000000000010100000000
000000100001000000000000001111110000000000000000000100000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
000000000001000000000000000001100000000000000000010000000000
00000011111100000000000000000110000000000000000001100000000000000011111100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
00000000111100000000000000000100000000000000000010000000000000000000010000000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010100000000000000011111100000000
00000001110100000000

4. January 2010
1 Comment »