Difference between revisions of "Ground Truth for LRDE DBD text line localization"

From TC11
Jump to: navigation, search
(Created page with "Datasets -> Datasets List -> Current Page {| style="width: 100%" |- | align="right" | {| |- | '''Created: '''2010-08-03 |- | {{Last updated}} |} |} =Keywords= scann…")
 
 
(3 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
{|  
 
{|  
 
|-
 
|-
| '''Created: '''2010-08-03
+
| '''Created: '''2013-05-30
 
|-
 
|-
 
| {{Last updated}}
 
| {{Last updated}}
Line 43: Line 43:
  
 
=Related Dataset=
 
=Related Dataset=
* [[]]
+
* [[LRDE Document Binarization Dataset (LRDE DBD)]]
  
 
=Related Tasks=
 
=Related Tasks=
Line 49: Line 49:
  
 
=Submitted Files=
 
=Submitted Files=
*  
+
==Version 1.0==
 +
* [http://www.iapr-tc11.org/dataset/LRDE/nouvel_obs_2402_textlines-1.0.zip Text lines localization] (9.8 Mb)
  
 
----
 
----
 
This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].
 
This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].

Latest revision as of 16:08, 3 July 2013

Datasets -> Datasets List -> Current Page

Created: 2013-05-30
Last updated: 2013-007-03

Keywords

scanned, magazine, documents, text line localization


Description

Text Lines Localization Information has been made available by applying text line localization algorithms. The size category of the text depends on the x-height and is considered with the following rule: 0 < small <= 30 < medium <= 55 < large < +inf

  • 123 large text lines localization (clean)
  • 320 medium text lines localization (clean).
  • 9551 small text lines localization (clean).
  • 123 large text lines localization (original).
  • 320 medium text lines localization (original).
  • 9551 small text lines localization (original).
  • 123 large text lines localization (scanned).
  • 320 medium text lines localization (scanned).
  • 9551 small text lines localization (scanned).

The text lines dataset covers only a subset of the full-document dataset. It is generated from the binarization of the full-document images. Text line localizations are stored as bounding box coordinates in text files.


Purpose of the three document qualities :

  • Original : evaluate the binarization quality on perfect documents mixing text and images.
  • Clean : evaluate the binarization quality on perfect document with text only.
  • Scanned : evaluate the binarization quality on slightly degraded documents with text only.

Related Dataset

Related Tasks

  • none

Submitted Files

Version 1.0


This page is editable only by TC11 Officers .