Daniel Hall Deep learning Sequence Learning technology

Lei Feng network by: paper author are big dragon, July 2011 graduated Yu CAs computing Institute; had any Baidu depth learning Institute (IDL) senior development engineers, and continuous two times get Baidu highest award-million dollars award; now Horizon Robotics is responsible for independent service robot, and intelligent home and toy direction of algorithm research and development, involved depth learning, and computer Visual, and human-computer interaction, and SLAM, and robot planning control, multiple field.

Deep learning dominance

Deep learning artificial intelligence dominance since 2006, Geoffery Hinton in science (Science) published his famous paper, deep learning craze swept from academia to industry.

From that day on, the application deep learning in industry was in full swing, really starts “depth” effect our lives. This Daniel classmates, participate in the development of the earliest image recognition technology based on CDNN, greatly improves the effect of computer vision-related online applications, innovative and leading research and development based on CNN and OCR recognition system BLSTM improved recognition rate for commercial OCR systems, his work has influenced millions of Internet users, including you and me.

Where deep learning advantage

One of the characteristics of artificial intelligence is the ability to learn, that is, system performance will improve as the accumulation of empirical data. We recognize the deep learning has huge advantages mainly in the following three aspects:

1. from the point of view of statistics and calculation, deep learning is particularly suitable for processing large data. On many issues, the depth is that we can find the best way to learn.

2. deep learning is not a black box system. It provides a rich, based on a join of the modeling language (modeling framework). Using the language system, we can express the richness of data relationships and structures, such as two-dimensional structure in image convolution processing using recurrent neural networks (Recurrent Neural Network, RNN) timing structure for handling natural language data.

3. deep learning is almost the only end-to-end machine learning system. It acts directly on the raw data, feature with automatic layer by layer, the whole process directly optimize an objective function.

On Sequence Learning technology-sharing site video

On Sequence Learning technologies to share live video from 2012 ImageNet race begins, deep learning first of all in the field of image recognition power play. Along with the in-depth study, deep learning has been applied to the field of audio, video and natural language understanding. These areas are characterized by temporal data modeling, we call the Sequence Learning. How to use deep learning for end-to-end learning and abandon the artificial rules of intermediate steps, to enhance the effect of Sequence Learning has become a hot topic of current research.

Sequence Learning has been successfully applied to a number of areas, such as speech recognition, Image such as Captain, machine translation, OCR, their common feature is an advanced semantic feature extraction using DNN or CNN, using RNN model timing information. In terms of loss functions, in addition to the usual logistic loss, also introduced structural losses, losses such as CTC sequence to sequence, and so on.

Simple variations on RNN-LSTM

CTC structure loss function

Sequence Learning, we believe that RNN and sequence of structural loss function is an important part of sequential learning success. In addition to the traditional simple RNN outside, there has been a lot of RNN variants, such as the LSTM (Long Short Temporal Memory) and GRU (Gated Recurrent Unit), has been widely applied to sequential learning tasks, they all have specific Recurrent structure, and through a series of gate switching Adaptive modeling of long-term information, to some extent, overcome Simple RNN Gradient disappears or explosion of the optimization process. CTC as a structural loss function, it is not necessary to split the sequence data, and estimate the probability of overall sequence labeling as a loss, has been widely applied to OCR, speech recognition and other sequences recognition task.

Here he is in OCR, for example, describes how to use a particular Sequence of machine learning Learning technology to upgrade the traditional OCR technologies.

The transformation of the traditional framework of optical character recognition based on end-to-end sequence of learning

Daniel is the Whiteboard explaining RNN

Concepts of optical character recognition was proposed as early as in the 1920 of the 20th century, is representative of the importance of research in the field of pattern recognition problem.

Classic OCR text recognition system from the input image to the output of the final result, after layout analysis, segmentation, segmentation, word recognition, decoding and post processing language model. Involves technology developed based on experience of rules and two categories based on statistical learning models. The former includes the pre-processing stage (layout analysis, segmentation, segmentation) binary access, domain analysis, project analysis, as well as rules dealing with phase noise filter which included based on histogram of oriented gradients (Histogram of Oriented Gradient, HOG) features vocabulary recognition engine and the language based on N-gram model for word recognition and decoding phase of language model.

Simple data, under the controlled conditions, classic architecture of optical character recognition through artificial rule-making and some parameters of the model, you can achieve good recognition accuracy. But in a wide range of natural scenes, the text presents the image complexity increases significantly, and the image is not very good control, classic optical character recognition technology to meet the needs of practical application. The reason, is the technical architecture of process cumbersome and lengthy constantly pass that caused the error, and the excessive reliance on artificial rules and ignore a large training data.

Solutions

Lack of complex scenes and classic technology framework, using machine learning of Sequence Learning technologies in particular on optical character recognition system process and technical framework for a major makeover.

In system process aspects, abandoned traditional of two value of and connected domain, based on rules of method, introduced based on learning of Boosting text detection concept, and and line segmentation merged into new of pretreatment module, task is detection image in the contains text of regional and generated corresponding text line; will word segmentation and word recognition merged into new of whole line recognition module; based on N-gram of language model decoding module be retained, but will main rely on artificial rules of layout analysis and Hou processing module from system in the delete. 6 steps to reduce to 3 steps, reducing the adverse impact of the error propagation.

Furthermore, as an entire line of text recognition is a learning sequence (Sequence Learning), we have specifically developed based on bidirectional long short-term memory neural networks (Bidirectional Long Short-term Memory, BLSTM) learning algorithm of recurrent neural network models, combined with the convolution neural network model for extracting image features, regardless of the specific location of each character, Whole image sequence corresponds to the text-only content, integrated character segmentation and word recognition problems, ultimately, deep learning theory pursued by–end-to-end training.

Do to take full advantage of contextual disambiguation text sequence, to avoid errors that cause irreversible character segmentation method. This sequence of learning model is good at recognizing Word segmentation difficult sequences of words, even scrawled a handwritten phone number. In addition, this sequence of learning difficulty model also makes training data annotation is greatly reduced, and facilitate the collection of larger-scale training data. Different languages (even if very different words, the length of sentence structure) optical character recognition technology within the framework of issues can also be integrated into a single unified solution, dramatically reduce system maintenance costs.

Summary and Outlook

As a practitioner of deep learning and Sequence Learning, we get a lot of valuable experience and knowledge:

1. the disturbance is our rich image on image priori knowledge for deep learning should be effective means of input. Unlike many other data, images and video in the time and space dimensions with good continuity and structural, and contain a lot of redundant information. Using the Pan and turn, still rotating, scaling, Gauss and the salt and pepper noise, wrong image processing such as transformation, are able to generate a large number of effective training data, enhance the robustness of deep learning model.

2.RNN as a modeling language sequence information can be effective modeling sequences within the dependency. RNN can use its memory to process any sequence of the input sequence, greatly reduced video processing, voice recognition, semantic understanding in a series of difficulties.

3. loss function is structured model for depth of knowledge we will learn effective ways to output. Using model of deep learning model output post processing, structured and targeted loss functions are often able to help deep learning process converges more quickly to a more ideal State.

Looking to the future, sequence recognition based on depth of learning can be clustered around the following focus: Disney case

Reinforcement learning

With Convolutional neural networks and recurrent neural networks, reinforcement learning model based on the data of the output characteristics more flexibility in the input sequence, and by more fuzzy supervisory training. So you can streamline model complexity, improve speed, and substantially reduce the difficulty of training data annotation, make the learning and prediction process does not require much human intervention, form closer to truly intelligent learning mode.

Attention modelAttention

As an abstract concept, it simulates human recognition behavior, not just a series of current status information, but rather in the decode process on the State of the sequence before an Adaptive model weighting information, all the information so that it can use context.

Lei feng’s network: the robot from the horizon authorized network of Lei Feng (search for “Lei feng’s network”, public interest), if reproduced please contact the original author. Disney iPhone 5 Case

VR Case

Virtual Reality in Your Pocket

Daniel Hall Deep learning Sequence Learning technology

Share this: