Difference between revisions of "IAMonDo - Hierarchical Layout and Full Transcription"

From TC11
Jump to: navigation, search
Line 22: Line 22:
 
The annotation process has been conducted manually. A software tool (InkAnno) – created for this purpose – allows one to select multiple strokes, group them together to an entity, select a label to specify the content type, and type in the transcription. For the annotation of the documents the templates were used as guides for the segmentation and  transcription which improved the accuracy of the ground truth.
 
The annotation process has been conducted manually. A software tool (InkAnno) – created for this purpose – allows one to select multiple strokes, group them together to an entity, select a label to specify the content type, and type in the transcription. For the annotation of the documents the templates were used as guides for the segmentation and  transcription which improved the accuracy of the ground truth.
  
The ground truth is stored along with the digital ink in the documents.
+
The ground truth is stored along with the digital ink in the documents of the IAMonDo dataset.
  
 
For the annotation of the data in InkML the <traceView> element has been used. Trace view elements can refer to actual traces or can contain other trace views. This allows one to create a hierarchically structured view of the digital ink contained in a document.
 
For the annotation of the data in InkML the <traceView> element has been used. Trace view elements can refer to actual traces or can contain other trace views. This allows one to create a hierarchically structured view of the digital ink contained in a document.
Line 36: Line 36:
  
 
=Submitted Files=
 
=Submitted Files=
(to be linked soon)
+
The ground truth is stored along with the digital ink in the documents of the IAMonDo dataset. Download the IAMonDo dataset [[IAM Online Document Database (IAMonDo-database)|here]].
<!--
 
* [http://www.iapr-tc11.org/dataset/IBN_SINA/labeling_dataset-group_reduced-labels Shape Labels for the IBN SINA Dataset] (1 Mb).
 
-->
 
  
  
 
----
 
----
 
This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].
 
This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].

Revision as of 17:17, 28 June 2010

Datasets -> Current Page

Created: 2010-06-28
Last updated: 2010-006-28

Keywords

Layout-analysis, text-detection, word spotting, handwriting recognition, recognition of document annotations, text and non-text distinction

Description

As ground truth the following entities have been identified, annotated, and transcribed: “text block”, “list”, “diagram”, “table”, “drawing”, “formula”, “text line”, “word”, “arrow”, “structuring element”, “correction”, and several types of markings. These elements have been structured hierarchically such that, for example, the root entity “text block“ contains “text line” elements, which again contain “word” elements.

The annotation process has been conducted manually. A software tool (InkAnno) – created for this purpose – allows one to select multiple strokes, group them together to an entity, select a label to specify the content type, and type in the transcription. For the annotation of the documents the templates were used as guides for the segmentation and transcription which improved the accuracy of the ground truth.

The ground truth is stored along with the digital ink in the documents of the IAMonDo dataset.

For the annotation of the data in InkML the <traceView> element has been used. Trace view elements can refer to actual traces or can contain other trace views. This allows one to create a hierarchically structured view of the digital ink contained in a document.

The <traceView> elements accept multiple name-value pairs as annotations. The annotation with name equal to “type” was used to specify the content type and the annotation with name equal to “transcription” specifies the transcription, e.g. for words. The structure of the annotation follows a strict definition which is specified in the file “annotationStructure.xml”.

Related Dataset

Related Tasks

Submitted Files

The ground truth is stored along with the digital ink in the documents of the IAMonDo dataset. Download the IAMonDo dataset here.



This page is editable only by TC11 Officers .