Keywords

scanned, magazine, documents, text line localization

Description

Text Lines Localization Information has been made available by applying text line localization algorithms. The size category of the text depends on the x-height and is considered with the following rule: 0 < small <= 30 < medium <= 55 < large < +inf

123 large text lines localization (clean)
320 medium text lines localization (clean).
9551 small text lines localization (clean).
123 large text lines localization (original).
320 medium text lines localization (original).
9551 small text lines localization (original).
123 large text lines localization (scanned).
320 medium text lines localization (scanned).
9551 small text lines localization (scanned).

The text lines dataset covers only a subset of the full-document dataset. It is generated from the binarization of the full-document images. Text line localizations are stored as bounding box coordinates in text files.

Purpose of the three document qualities :

Original : evaluate the binarization quality on perfect documents mixing text and images.
Clean : evaluate the binarization quality on perfect document with text only.
Scanned : evaluate the binarization quality on slightly degraded documents with text only.

Related Dataset

[[]]

Related Tasks

none

Submitted Files

This page is editable only by TC11 Officers .

Navigation menu

Ground Truth for LRDE DBD text line localization

Contents

Keywords

Description

Related Dataset

Related Tasks

Submitted Files