Neural Networks

http://www.codeproject.com/csharp/Neural_Network_OCR.asp
(Note: Knowledge of C# / Java / C++ is recommended.) 

The beauty of neural networks is that they’re inherently generalized with respect to inputs and outputs.  Simply put, there is no type of neural network dedicated to one particular task.  While there are many implementations and algorithms that can be used to create a neural network, they can all be manipulated into recognizing text in images, into detecting which samples of protein crystals are likely to hold up the longest under X-ray diffraction (there was a project at BOOM nvolving this), and so forth.  Best of all, you don’t have to figure out how to configure them to get the results you want!  Simply provide a network with several sets of sample inputs and their expected outputs, and run the network in a training mode” for a few thousand iterations, and the neural network “figures out” what aspects of a particular input make it unique, as well as how to categorize a wide assortment of additional input.  Of course, the accuracy of the network depends on how many sample inputs you provide, and how many iterations you run in training mode. 

The article discusses the implementation of a C# application that trains a neural network to perform OCR (optical character recognition).  The input for this application is a 5×6 pixel, black-and-white image of a letter.  The output is a vector (array) of probabilities that the input matches each of the 26 possible letters, which can then be used to determine which ASCII (text) character the image most likely represents.  It is important to note that neural networks can never produce perfectly accurate results, as their training is, in essence, based upon learning by trial and error (with some heuristics thrown in to help accelerate the process).  As such, the neural network used in the article will not classify, for example, an image of the letter K as the letter “K”.  Rather, it will classify it as x% “K”, where x is some high percentage (assuming enough training has occurred).  This is why most applications of neural networks operate with the concept of an accuracy threshold and accept results that are correct with some high probability, and this is why even professional / commercial OCR applications make mistakes. 

In this program, a neuron is essentially a set of code that models physical neurons in the brain.  Because this program uses a single-layer neural network, each neuron accepts an input vector, and is connected to an output vector.  Multiple-layer neural networks also exist, and are generally used to perform a sequence of increasingly concrete classifications when there are too many possibilities to create training data for all potential inputs, and the speed of classification is important.  (For example, a three-layer network might classify an image of a dog as an “animal”, then as “four-legged”, then as a “dog”.  The separate layers of classifications allow the network to reject impossible classifications early on.  For example, if the aforementioned three-layer network was then given an image of a plant, it could reject everything that is a subclass of “animal” early in the analysis, thereby increasing the speed at which it runs.)  An input vector is a set of data that describes the input; in this case, it’s a 30 element array of floating point values (one value for each pixel) that represent whether a pixel is “on” (black, = 0.5f) or “off” (white, = -0.5f) in the letter that the image represents.  The output vector in this case is a 26 element array of floating point values, with all elements initially set to -0.5f, with the ith element representing the probability that the input letter is the ith letter of the alphabet. 

To recognize a letter in an image, the program loops through every element in the input vector and sends it through the network.  Each neuron “fires” (outputs 1) if its input is probably (as indicated by training) in the output letter it is representing (i.e. the 3rd neuron represents “C”).  The average of all neuron outputs then comprises the probability that the image represents the letter being checked.  Once all possible letters have been processed, the one with the highest probability of being correct is chosen as the proper interpretation of the image.  Commercial OCR applications work in (practically) the same way; they also include proprietary heuristics that take into effect the context of the word being recognized, as well as the word itself, to limit the set of possible outputs and increase recognition accuracy. 

The benefits of neural networks, however, are not limited to OCR.  Because they are indifferent with regard to the types of input they can process, and their output is determined by training, neural networks can be implemented for an astounding variety of tasks.  Future applications include improved searches (done by analyzing potential matches, context of queries, past queries, and so forth), improved identification and security systems, and new ways of interacting with computers (i.e. through analysis of brain waves / “reading your mind”). 

Posted in Topics: Mathematics, Science, Technology

Responses are currently closed, but you can trackback from your own site.

Comments are closed.



* You can follow any responses to this entry through the RSS 2.0 feed.