Ground Truth for LRDE DBD text line localization
From TC11
Revision as of 17:15, 30 May 2013 by Liwicki (talk | contribs) (Created page with "Datasets -> Datasets List -> Current Page {| style="width: 100%" |- | align="right" | {| |- | '''Created: '''2010-08-03 |- | {{Last updated}} |} |} =Keywords= scann…")
Datasets -> Datasets List -> Current Page
|
Keywords
scanned, magazine, documents, text line localization
Description
Text Lines Localization Information has been made available by applying text line localization algorithms. The size category of the text depends on the x-height and is considered with the following rule: 0 < small <= 30 < medium <= 55 < large < +inf
- 123 large text lines localization (clean)
- 320 medium text lines localization (clean).
- 9551 small text lines localization (clean).
- 123 large text lines localization (original).
- 320 medium text lines localization (original).
- 9551 small text lines localization (original).
- 123 large text lines localization (scanned).
- 320 medium text lines localization (scanned).
- 9551 small text lines localization (scanned).
The text lines dataset covers only a subset of the full-document dataset. It is generated from the binarization of the full-document images. Text line localizations are stored as bounding box coordinates in text files.
Purpose of the three document qualities :
- Original : evaluate the binarization quality on perfect documents mixing text and images.
- Clean : evaluate the binarization quality on perfect document with text only.
- Scanned : evaluate the binarization quality on slightly degraded documents with text only.
Related Dataset
- [[]]
Related Tasks
- none
Submitted Files
This page is editable only by TC11 Officers .