Description

The objective is to assess the performance of an arabic recognition system at the level of letter-blocks.

The recognition error at letter-blocks level is evaluated in a cross-validation way. The recognition error is defined as:

ERR(s, t_s) = STRING_DISTANCE(s, t_s)/STRING_LENGTH(t_s)

where

s: the recognized label of a letter-block.
t_s: the associated true label of that letter-block.
STRING_DISTANCE(s, t_s): the string distance between two strings s and t_s.
STRING_LENGTH(t_s): the number of letters in the string t_s.

The evaluation protocol proposed is the following. In each run, the dataset is randomly divided in two subsets, a training and a test one. The corresponding sizes of the training and test subsets are 80% and 20% respectively. The letter-blocks labeled as 'NaN' are ignored. The target recognition system is trained on the training subset and then its performance is evaluated on the test subset in terms of the recognition error:

ERR = AVERAGE{ERR(s, _ts), for all s in the test subset}

The process is repeated 10 times, and the average of the 10 recognition errors is considered as the performance of the target method.

Related Dataset and Ground Truth Data

IBN SINA: A database for research on processing and understanding of Arabic manuscripts images (originally proposed for v2.0 of the dataset)

References

Reza Farrahi Moghaddam, Mohamed Cheriet, Mathias M. Adankon, Kostyantyn Filonenko, and Robert Wisnovsky, “IBN SINA: A database for research on processing and understanding of Arabic manuscripts images”, Proceedings of DAS’10, June 9-11, 2010, Boston, MA, USA

This page is editable only by TC11 Officers .

Navigation menu

Letter-block Recognition in Arabic

Description

Related Dataset and Ground Truth Data

References