Difference between revisions of "The Street View Text Dataset"

From TC11
Jump to: navigation, search
(Created page with "Datasets -> Datasets List -> Current Page {| style="width: 100%" |- | align="right" | {| |- | '''Created: '''2012-10-06 |- | {{Last updated}} |} |} =Contact Author=…")
 
Line 22: Line 22:
 
  La Jolla, CA 92093-0404  
 
  La Jolla, CA 92093-0404  
 
  Email: k[http://mailhide.recaptcha.net/d?k=01omEyRZid0nVm1TN9t98J1A==&c=cUMGBbQUpaP6Zu0AxhNJ8zylFKIVmLYKdQ3GCxUgtxY= ...]@cs.ucsd.edu
 
  Email: k[http://mailhide.recaptcha.net/d?k=01omEyRZid0nVm1TN9t98J1A==&c=cUMGBbQUpaP6Zu0AxhNJ8zylFKIVmLYKdQ3GCxUgtxY= ...]@cs.ucsd.edu
 
<!--
 
  
 
=Current Version=
 
=Current Version=
[[Image:neocr_examples.jpg|400px|thumb|right| Example images from the NEOCR dataset. Note that the dataset also includes images with text in different languages, text with vertical character arrangement, light text on dark and dark text on light background, occlusion,
+
[[Image:StreetViewText_Sample.jpg|400px|thumb|right| Example images from the Street View Text dataset.]]
good and bad contrast..]]
 
[[Image:neocr_examples_bb.jpg|400px|thumb|right| Example of different text characteristics present in images of the NEOCR dataset, along with ground truth bounding boxes and distortion quadrangles.]]
 
[[Image:neocr_labeling_screenshot.jpg|400px|thumb|right| The [http://labelme.csail.mit.edu/ LabelMe] interface used for ground truthing.]]
 
  
1.0 (also available from the [http://www6.cs.fau.de/research/projects/pixtract/neocr/ NEOCR Web site])
+
1.0 (also available from the [http://vision.ucsd.edu/~kai/svt/ Author's Web site])
  
 
=Keywords=
 
=Keywords=
OCR, Natural Scene, Scene Text, Word Spotting, Scene Text Recognition, Scene Text Detection, Scene Text Localization
+
OCR, Real Scene, Urban Scene, Scene Text, Word Spotting, Scene Text Recognition, Scene Text Detection, Scene Text Localization
  
 
=Description=
 
=Description=
The NEOCR dataset contains 659 real world images with 5238 annotated bounding boxes (textfields). The images were taken by several people independently from the dataset, so the dataset covers a broad range of characteristics which distinguish real world images from scanned documents. All text recognizable by humans has been annotated for all images. The dataset creation process was stopped when for each metadata dimension at least 100 textfields were included in the dataset.
+
The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. More details about the data set can be found in our paper, Word Spotting in the Wild [[#References|[1]]]. For our up-to-date benchmarks on this data, see our paper, End-to-end Scene Text Recognition [[#References|[2]]].
  
The ground truth contains not only the visible text, but also distortion quadrangles, which enclose the visible text much more precisely than bounding boxes. The dataset is enriched with metadata consisting of brightness, contrast, inversion, texture, resolution, noise, blur, distortion, rotation, character arrangement, occlusion, typeface and language information. The annotation is provided in XML based on the schema of LabelMe.
+
This dataset only has word-level annotations (no character bounding boxes) and should be used for
 +
* cropped lexicon-driven word recognition and
 +
* full image lexicon-driven word detection and recognition.
  
=Metadata and Ground Truth Data=
+
If you need character training data then you should look into the Chars74K and ICDAR datasets.
The annotation was created manually by an adaptation of the LabelMe annotation tool. All text visible and recognizable by humans has been annotated for all images. The annotation is provided in XML, the schema of LabelMe was extended to our needs. The extended XMLschema is also provided as part of the dataset. Metadata is provided globally and locally.
 
  
Global image metadata includes the filename, folder, source information, image width, height, depth, brightness and contrast. Textfield (local, bounding box) metadata contains the visible text and optical, geometrical and typographical characteristics. Bounding boxes are rectangular and parallel to the axes. Additionally distortion quadrangles are provided which enclose the visible text more precisely.
+
<!--
  
Optical characteristics include texture, brightness, contrast, inversion, resolution, noise and blur information. Texture, noise and inversion were annotated manually, the rest was computed automatically using ImageMagick. Geometrical characteristics cover distortion, rotation, character arrangement and occlusion information. Typographical characteristics contain typeface and language metadata. Please see the CBDAR paper [[#References|[1]]], the technical report [[#References|[2]]] or the [http://www.iapr-tc11.org/dataset/NEOCR/neocr_metadata_doc.pdf metadata documentation] for further details on the metadata.
+
=Metadata and Ground Truth Data=
 +
TODO
  
 
=Related Tasks=
 
=Related Tasks=
Line 54: Line 51:
  
 
=References=
 
=References=
# R. Nagy, A. Dicker and K. Meyer‐Wegener, "NEOCR: A Configurable Dataset for Natural Image Text Recognition". In CBDAR Workshop 2011 at ICDAR 2011. pp. 53‐58, September 2011. [http://www.iapr-tc11.org/dataset/NEOCR/cbdar_paper.pdf (PDF)], [http://www.iapr-tc11.org/dataset/NEOCR/cbdar_presentation.pdf (Presentation)]
+
# To DO. [http://www.iapr-tc11.org/dataset/NEOCR/cbdar_paper.pdf (PDF)]
# R. Nagy, A. Dicker, and K. Meyer‐Wegener, "Definition and Evaluation of the NEOCR Dataset for Natural‐Image Text Recognition". University of Erlangen, Dept. of Computer Science, Technical Reports, CS‐2011‐07, September 2011. [http://www.iapr-tc11.org/dataset/NEOCR/neocr_techrep.pdf (PDF)]
+
 
+
=Download=
=Submitted Files=
 
==Disclaimer==
 
By downloading and using the dataset you agree to acknowledge it's source and cite the above papers in related publications. Please link to the authors' Web page of the set as http://www6.cs.fau.de/neocr.
 
  
 
==Version 1.0==
 
==Version 1.0==
 +
TODO
 
* [http://www.iapr-tc11.org/dataset/NEOCR/neocr_dataset.tar.gz The complete NEOCR dataset with annotations] (1.3 GB)
 
* [http://www.iapr-tc11.org/dataset/NEOCR/neocr_dataset.tar.gz The complete NEOCR dataset with annotations] (1.3 GB)
* Disjoint split of the NEOCR images for training and testing [http://www.iapr-tc11.org/dataset/NEOCR/test.txt Test Set Image Listing] (5 KB), [http://www.iapr-tc11.org/dataset/NEOCR/train.txt Training Set Image Listing] (5 KB)
 
* [http://www.iapr-tc11.org/dataset/NEOCR/annotation.xsd The NEOCR XML-Schema definitions for the annotations] (10 KB)
 
* [http://www.iapr-tc11.org/dataset/NEOCR/neocr_metadata_doc.pdf NEOCR Metadata Documentation PDF] (7.5 MB)
 
  
 
-->
 
-->

Revision as of 12:03, 16 October 2012

Datasets -> Datasets List -> Current Page

Created: 2012-10-06
Last updated: 2012-10-16

Contact Author

Kai Wang
EBU3B, Room 4148
Department of Comp. Sci. and Engr.
University of California, San Diego
9500 Gilman Drive, Mail Code 0404
La Jolla, CA 92093-0404 
Email: k...@cs.ucsd.edu

Current Version

Example images from the Street View Text dataset.

1.0 (also available from the Author's Web site)

Keywords

OCR, Real Scene, Urban Scene, Scene Text, Word Spotting, Scene Text Recognition, Scene Text Detection, Scene Text Localization

Description

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses. More details about the data set can be found in our paper, Word Spotting in the Wild [1]. For our up-to-date benchmarks on this data, see our paper, End-to-end Scene Text Recognition [2].

This dataset only has word-level annotations (no character bounding boxes) and should be used for

  • cropped lexicon-driven word recognition and
  • full image lexicon-driven word detection and recognition.

If you need character training data then you should look into the Chars74K and ICDAR datasets.



This page is editable only by TC11 Officers .