Letter-block Recognition in Arabic

From TC11
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2011-10-02
Last updated: 2011-10-03

Proposed By

Prof Mohamed Cheriet
 Synchromedia Laboratory
 ETS, Montréal, (QC) Canada
 H3C 1K3
 E-mail: mohamed.cheriet@etsmtl.ca
 Tel: +1(514)396-8972
 Fax: +1(514)396-8595

Description

The objective is to assess the performance of an arabic recognition system at the level of letter-blocks.

The recognition error at letter-blocks level is evaluated in a cross-validation way. The recognition error is defined as:

ERR(s, t_s) = STRING_DISTANCE(s, t_s)/STRING_LENGTH(t_s)

where

s: the recognized label of a letter-block.
t_s: the associated true label of that letter-block.
STRING_DISTANCE(s, t_s): the string distance between two strings s and t_s.
STRING_LENGTH(t_s): the number of letters in the string t_s.

The evaluation protocol proposed is the following. In each run, the dataset is randomly divided in two subsets, a training and a test one. The corresponding sizes of the training and test subsets are 80% and 20% respectively. The letter-blocks labeled as 'NaN' are ignored. The target recognition system is trained on the training subset and then its performance is evaluated on the test subset in terms of the recognition error:

ERR = AVERAGE{ERR(s, _ts), for all s in the test subset}

The process is repeated 10 times, and the average of the 10 recognition errors is considered as the performance of the target method.

Related Dataset and Ground Truth Data

References

  1. Reza Farrahi Moghaddam, Mohamed Cheriet, Mathias M. Adankon, Kostyantyn Filonenko, and Robert Wisnovsky, “IBN SINA: A database for research on processing and understanding of Arabic manuscripts images”, Proceedings of DAS’10, June 9-11, 2010, Boston, MA, USA



This page is editable only by TC11 Officers .