Devanagari Character Dataset

From TC11
Revision as of 18:03, 27 January 2011 by Dimos (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2010-11-19
Last updated: 2011-001-27

Contact Author

Santosh K. C.
INRIA Nancy Grand Est Reseach Centre
LORIA Campus Scientifique
BP - 239, 54506 Vandoeuvre-les Nancy Cedex, FRANCE 
E-mail: Santosh.KC@inria.fr

Current Version

1.0

Keywords

Online handwriting, Devangari, On-line Character Recognition

Description

Six different samples of the character 'ka'. Red dots denote the start of each stroke.

This dataset of on-line handwriteen Devangari characters is composed of 1800 samples from 36 character classes obtained by 25 native writers. Each writer was asked to provde two samples per class.

No specific directions, constraints, or instructions were given to the users, aiming for a database of completely natural handwritings.

For data collection we used a simple Graphite tablet (WCACOM ET0405A-U), which captures the pen-tip position in the form of 2D coordinates.

Metadata and Technical Details

Each character is stored in a separate file and the files are text based comma separated values. The size of each character is approximately 4KB in average (actual size varies depending on the number and size of the strokes coprising the character).

The dataset is organised in folders that reflect the 36 classes. Inside each class folder there are 50 samples. For every writer there are two samples per class denoted by userX_1 and userX_2.

The digitizer captures a series of strokes during pen movement. A string of coordinates (pen-tip positions) from pen down to pen up movement represents a stroke.

For simplicity, we have inserted the special value [−1.0, −1.0] to indicate the termination of a stroke that makes it easier to count and separate strokes in a complete character. The following is an example for a two-stroke character. It is important to note that a series of [-1.0, -1.0] can be received when writing with tremor as well as in the case where pen-tip is just above the surface of the pad. Pre-processing is left to the end-user.

Related Ground Truth Data

N/A

Related Tasks

References

  1. Santosh K.C., Cholwich Nattee, Bart Lamiroy, 'Spatial Similarity based Stroke Number and Order Free Clustering', IAPR, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 2010
  2. Santosh K.C., Cholwich Nattee, 'Template-based Nepali Handwritten Alphanumeric Character Recognition', Thammasat International Journal of Science and Technology (TIJSAT), Thailand, Vol. 12, No. 1, pp. 20 - 30, 2007
  3. Santosh K.C., Cholwich Nattee, 'Stroke Number and Order Free Handwriting Recognition for Nepali', 9th Pacific Rim International Conference on Artificial Intelligence (PRICAI), Springer - Lecture Notes in Computer Science (LNCS), Subseries: Lecture Notes in Artifical Intelligence (LNAI), Guilin, China, Vol. 4099, pp. 990-994, August 7 - 11, 2006
  4. Santosh K.C., Cholwich Nattee, 'Structural Approach on Writer Independent Nepalese Natural Handwriting Recognition', IEEE, International Conference on Cybernetics & Intelligent Systems (CIS), Bangkok, Thailand, pp. 711-716, June 7 - 9, 2006
  5. Santosh K.C., Cholwich Nattee, 'Effect of Pre-processing and Feature Selection in Recognition for Nepali', International Conference on Knowledge, Information, Creativity, and Support Systems (KICSS), Ayuthya, Thailand, pp. 139-146, August 1 - 4, 2006

Submitted Files

Version 1.0

Files


This page is editable only by TC11 Officers .