Ground Truth for LRDE DBD text line localization
From TC11
Datasets -> Datasets List -> Current Page
|
Contents
Keywords
scanned, magazine, documents, text line localization
Description
Text Lines Localization Information has been made available by applying text line localization algorithms. The size category of the text depends on the x-height and is considered with the following rule: 0 < small <= 30 < medium <= 55 < large < +inf
- 123 large text lines localization (clean)
- 320 medium text lines localization (clean).
- 9551 small text lines localization (clean).
- 123 large text lines localization (original).
- 320 medium text lines localization (original).
- 9551 small text lines localization (original).
- 123 large text lines localization (scanned).
- 320 medium text lines localization (scanned).
- 9551 small text lines localization (scanned).
The text lines dataset covers only a subset of the full-document dataset. It is generated from the binarization of the full-document images. Text line localizations are stored as bounding box coordinates in text files.
Purpose of the three document qualities :
- Original : evaluate the binarization quality on perfect documents mixing text and images.
- Clean : evaluate the binarization quality on perfect document with text only.
- Scanned : evaluate the binarization quality on slightly degraded documents with text only.
Related Dataset
Related Tasks
- none
Submitted Files
Version 1.0
- Text lines localization (9.8 Mb)
This page is editable only by TC11 Officers .