Deep Neural Networks for Large Vocabulary Handwritten Text Recognition PhD work

Download Slides

Thesis

Abstract

The automatic transcription of text in handwritten documents has many applications, from automatic document processing, to indexing and document understanding. One of the most popular approaches nowadays consists in scanning the text line image with a sliding window, from which features are extracted, and modeled by Hidden Markov Models (HMMs). Associated with Neural Networks, such as Multi-Layer Perceptrons (MLPs) or Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs), and with a language model, these models yield good transcriptions. On the other hand, in many machine learning applications, including speech recognition and computer vision, deep Neural Networks consisting of several hidden layers recently produced a significant reduction of error rates.

In this thesis, we have conducted a thorough study of different aspects of optical model based on deep Neural Networks in the hybrid Neural Network / HMM scheme, in order to better understand and evaluate their relative importance. First, we show that deep Neural Networks produce consistent and significant improvements over networks with one or two hidden layers, independently of the kind of neural network, MLP or RNN, and of input, handcrafted features or pixels. Then, we show that deep Neural Networks with pixel inputs compete with those using handcrafted features, and that depth plays an important role in the reduction of the performance gap between the two kinds of inputs, supporting the idea that deep Neural Networks effectively build hierarchical and relevant representations of their inputs, and that features are automatically learnt on the way. Despite the dominance of LSTM-RNNs in the recent literature of handwriting recognition, we show that deep MLPs achieve comparable results. Moreover, we evaluated different training criteria. With sequence-discriminative training, we report similar improvements for MLP/HMMs as those observed in speech recognition. We also show how the Connectionist Temporal Classification framework is especially suited to RNNs. Finally, the novel dropout technique to regularize neural networks was recently applied to LSTM-RNNs. We tested its effect at different positions in LSTM-RNNs, thus extending previous works, and we show that its relative position to the recurrent connections is important.

We conducted the experiments on three public databases, representing two languages (English and French) and two epochs, using different kinds of neural network inputs: handcrafted features and pixels. We validated our approach by taking part to the HTRtS contest in 2014. The results of the final systems presented in this thesis, namely MLPs and RNNs, with handcrafted feature or pixel inputs, are comparable to the state-of-the-art on Rimes and IAM. Moreover, the combination of these systems outperformed all published results on the considered databases.

Keywords

Deep Neural Networks Multi-Layer Perceptron Recurrent Neural Networks Hidden Markov Models Dropout Handwriting Recognition

Supervisors

  Prof. Hermann Ney, RWTH Aachen, LIMSI CNRS
  Christopher Kermorvant, Teklia, A2iALab

Publications

2015 5

  • Théodore Bluche (2015) Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. PhD thesis
  • Théodore Bluche, Hermann Ney, Jérôme Louradour, Christopher Kermorvant (2015) Framewise and CTC Training of Neural Networks for Handwriting Recognition. In 13th International Conference on Document Analysis and Recognition (ICDAR), 81-85.
  • Théodore Bluche, Hermann Ney, Christopher Kermorvant (2015) The LIMSI Handwriting Recognition System for the HTRtS 2014 Contest. In 13th International Conference on Document Analysis and Recognition (ICDAR), 86-90.
  • Théodore Bluche, Christopher Kermorvant, Jérôme Louradour (2015) Where to Apply Dropout in Recurrent Neural Networks for Handwriting Recognition?. In 13th International Conference on Document Analysis and Recognition (ICDAR), 681-685.
  • Dominique Stutzmann, Théodore Bluche, Alexei Lavrentev, Yann Leydier, Christopher Kermorvant (2015) From Text and Image to Historical Resource: Text-Image Alignment for Digital Humanists. In Digital Humanities (DH2015).

2014 5

  • Théodore Bluche, Hermann Ney, Christopher Kermorvant (2014) A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition. In International Conference on Statistical Language and Speech Processing (SLSP), 199-210.
  • Théodore Bluche, Bastien Moysset, Christopher Kermorvant (2014) Automatic Line Segmentation and Ground-Truth Alignment of Handwritten Documents. In International Conference on Frontiers in Handwriting Recognition (ICFHR), 667-672.
  • Bastien Moysset, Théodore Bluche, Maxime Knibbe, Mohamed Faouzi Benzeghiba, Ronaldo Messina, Jérôme Louradour, Christopher Kermorvant (2014) The A2iA Multi-lingual Text Recognition System at the Maurdor Evaluation. In International Conference on Frontiers in Handwriting Recognition (ICFHR). 297-302.
  • Vu Pham, Théodore Bluche, Christopher Kermorvant, Jérôme Louradour (2014) Dropout improves recurrent neural networks for handwriting recognition. In International Conference on Frontiers in Handwriting Recognition (ICFHR), 285-290.
  • Théodore Bluche, Jérôme Louradour, Maxime Knibbe, Bastien Moysset, Faouzi Benzeghiba, Christopher Kermorvant (2014) The A2iA Arabic Handwritten Text Recognition System at the OpenHaRT2013 Evaluation. In International Workshop on Document Analysis Systems (DAS), 161-165.

2013 2

  • Théodore Bluche, Hermann Ney, Christopher Kermorvant (2013) Feature extraction with convolutional neural networks for handwritten word recognition. In 12th International Conference on Document Analysis and Recognition (ICDAR), 285-289
  • Théodore Bluche, Hermann Ney, Christopher Kermorvant (2013) Tandem HMM With Convolutional Neural Network For Handwritten Word Recognition. In 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2390-2394.

2012 1

  • Jérôme Louradour, Théodore Bluche, Anne-Laure Bianne-Bernard, Farès Menasri, Christopher Kermorvant (2012) De l'usage des scores et des alternatives de reconnaissance pour la classification d'images de documents manuscrits. In Colloque International Francophone sur l'Ecrit et le Document.