Difference between revisions of "Datasets List"

From TC11
Jump to: navigation, search
(On-line)
Line 28: Line 28:
  
 
* [[KAIST Scene Text Database]]
 
* [[KAIST Scene Text Database]]
 +
 +
= Mixed Content Documents =
 +
* [http://www.umiacs.umd.edu/~zhugy/Tobacco800.html Tobacco800 Document Image Database] - composed of 1290 document images collected and scanned using a wide variety of equipment over time.
  
 
= Handwritten Documents =
 
= Handwritten Documents =

Revision as of 19:31, 27 January 2011

Datasets -> Datasets List

Last updated: 2011-001-27

See the datasets sorted according to the Journal / Conference they first appeared in.

Machine-printed Documents

Graphical Documents

Scene Text

Mixed Content Documents

Handwritten Documents

On-line

Off-line

  • IAM Database - A full English sentence database for off-line handwriting recognition.
  • MARG- Medical Article Records Groundtruth - A freely-available repository of document page images and their associated textual and layout data. The data has been reviewed and corrected to establish its "ground truth". Please contact Dr. George Thoma (thoma@lhc.nlm.nih.gov) at the National Library of Medicine for more information.



This page is editable only by TC11 Officers .