Letter-block Recognition in Arabic
Datasets -> Datasets List -> Current Page
|
Description
The objective is to assess the performance of an arabic recognition system at the level of letter-blocks.
The recognition error at letter-blocks level is evaluated in a cross-validation way. The recognition error is defined as:
ERR(s, t_s) = STRING_DISTANCE(s, t_s)/STRING_LENGTH(t_s)
where
s: the recognized label of a letter-block. t_s: the associated true label of that letter-block. STRING_DISTANCE(s, t_s): the string distance between two strings s and t_s. STRING_LENGTH(t_s): the number of letters in the string t_s.
The evaluation protocol proposed is the following. In each run, the dataset is randomly divided in two subsets, a training and a test one. The corresponding sizes of the training and test subsets are 80% and 20% respectively. The letter-blocks labeled as 'NaN' are ignored. The target recognition system is trained on the training subset and then its performance is evaluated on the test subset in terms of the recognition error:
ERR = AVERAGE{ERR(s, _ts), for all s in the test subset}
The process is repeated 10 times, and the average of the 10 recognition errors is considered as the performance of the target method.
Related Dataset and Ground Truth Data
- IBN SINA: A database for research on processing and understanding of Arabic manuscripts images (originally proposed for v2.0 of the dataset)
References
- Reza Farrahi Moghaddam, Mohamed Cheriet, Mathias M. Adankon, Kostyantyn Filonenko, and Robert Wisnovsky, “IBN SINA: A database for research on processing and understanding of Arabic manuscripts images”, Proceedings of DAS’10, June 9-11, 2010, Boston, MA, USA
This page is editable only by TC11 Officers .