HMM-based Indic handwritten word recognition using zone segmentation

Pattern Recognition (PR)
Handwriting
Authors
Affiliations

Partha Pratim Roy

Indian Institute of Technology, Roorkee

Ayan Kumar Bhunia

Institute of Engineering & Management (IEM), Kolkata

Ayan Das

Institute of Engineering & Management (IEM), Kolkata

Prasenjit Dey

Institute of Engineering & Management (IEM), Kolkata

Umapada Pal

CVPR Unit, ISI Kolkata

Published

December 1, 2016

Paper

Abstract

This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information. Because of complex nature due to compound characters, modifiers, overlapping and touching, etc., character segmentation and recognition is a tedious job in Indic scripts (e.g. Devanagari, Bangla, Gurumukhi, and other similar scripts). To avoid character segmentation in such scripts, HMMbased sequence modeling has been used earlier in holistic way. This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and recognize the corresponding zones. The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts. As a result, use of this zone segmentation approach enhances the recognition performance of the system. The components in middle zone where characters are mostly touching are recognized using HMM. After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters. Next, the residue components, if any, in upper and lower zones in their respective boundary are combined to achieve the final word level recognition. Water reservoir feature has been integrated in this framework to improve the zone segmentation and character alignment defects while segmentation. A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition. PHOG features has been compared with other existing features and found robust in Indic script recognition. An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation. From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition.

Citation

BibTeX citation:
@article{pratim_roy2016,
  author = {Pratim Roy, Partha and Kumar Bhunia, Ayan and Das, Ayan and
    Dey, Prasenjit and Pal, Umapada},
  title = {HMM-Based {Indic} Handwritten Word Recognition Using Zone
    Segmentation},
  journal = {Pattern Recognition (PR)},
  volume = {60},
  pages = {1057 - 1075},
  date = {2016-12-01},
  url = {https://arxiv.org/abs/1708.00227},
  langid = {en}
}
For attribution, please cite this work as:
Pratim Roy, Partha, Ayan Kumar Bhunia, Ayan Das, Prasenjit Dey, and Umapada Pal. 2016. “HMM-Based Indic Handwritten Word Recognition Using Zone Segmentation.” Pattern Recognition (PR) 60 (December): 1057–75. https://arxiv.org/abs/1708.00227.