KAIST Scene Text Ground Truth (text location, segmantation and recognition)

From TC11
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2011-01-11
Last updated: 2011-001-28


Scene Text, Korean, English, Signboard, Mobile phone, Indoor image, Outdoor image


Two aspects of ground truth are provided for the KAIST Scene Text dataset.

First, an XML file is provided for each image that contains information about the location (in terms of bounding boxes) of single characters or single words and their transcription, along with global information about the image.

In detail, the XML file includes the following information:

  • Image name and size
  • Words location
  • Illumination condition. The <illumination> tag is set to "yes" if the text regions were affected by non-uniform illumination, regardless of whether it is artificial light or sunlight, otherwise the tag is set to "no". The assessments are quite subjective, please use with care.
  • A difficulty rating from 1 to 5 as determined by the human subject that dealt with the image (The difficulty is not meant to be an accurate measure, it is quite subjective).

Second, in addition to the XML file, a bitmap image is provided where the segmentation of the text at pixel level has been manually defined. In the bitmap files light red or white signifies text pixels while black signifies background ones.

The ground truth files have the same name as the original image and extensions XML and BMP accordingly.

Related Dataset

Related Tasks

Submitted Files

The ground truth information is stored along with the original images in the zip files of the dataset. Download the KAIST dataset here.

This page is editable only by TC11 Officers .