This is a very basic optical character recognition script written in PHP. This is untested and serves merely a proof of concept. As noted in the comments, adjusting the sample size can improve results, since with a large sample size on a small image there can be many collisions. A database is needed to compare output results of this script with known values.
[code lang="PHP"]
/* create a test image */
$im = @imagecreate(100, 20) or die("Cannot Initialize new GD image stream");
$background_color = imagecolorallocate($im, 255, 255, 255);
$text_color = imagecolorallocate($im, 0, 0, 0);
imagestring($im, 1, 5, 5, "Hello, World!", $text_color);
/***
* Assumptions:
* A monochrome image where characters are black
* A single character is connected
* Characters are disjointed by white space
*/
$width = imagesx($im);
$height = imagesy($im);
/***
* Notes:
* The smaller the sample size the more accurate it will be,
* however, it will take longer. Larger images can use a larger
* sample size wihtout compromizing much accuracy.
*/
$x_sample = 1;
$y_sample = 1;
$last = 0;
for($i = 0; $i < $width; $i++) {
$col = array();
for($j = 0; $j < $height; $j++) {
$col[$j] = imagecolorat($im, $i, $j);
}
if(($current = array_sum($col)) > 0) {
if($last == 0) {
$l = 0;
}
for($k = 0; $k < $height; $k++) {
if(($l % $x_sample) == 0) {
if(($k % $y_sample) == 0) {
$sample .= $col[$k];
}
}
}
$l++;
$last = $current;
} else {
$last = 0;
}
if(!empty($sample) && $last == 0) {
echo $sample . "\n";
$sample = "";
}
}
?>[/code]
Output (each line represents a character)
00000000011000000000000000001011000000000000000011010000000000000000010100000000
000000100001000000000000001111110000000000000000000100000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
000000000001000000000000000001100000000000000000010000000000
00000011111100000000000000000110000000000000000001100000000000000011111100000000
00000000011000000000000000001001000000000000000010010000000000000000011000000000
00000000111100000000000000000100000000000000000010000000000000000000010000000000
000000100001000000000000001111110000000000000000000100000000
00000000011000000000000000001001000000000000000010100000000000000011111100000000
00000001110100000000
Thank for sharing. But I confused from your ouput program. How we can get character from your binary number ouput. Can you share that ?
Thank you
|
You will need to create a training set – basically run this against characters you know, store that binary string in a database., then next time you see that string, you can reference the database and lookup the character.
|