CVL-Database

From TC11
Jump to: navigation, search

Datasets -> Datasets List -> Current Page

Created: 2013-07-22
Last updated: 2013-008-24

CVL-Database - An Off-line Database for Writer Retrieval, Writer Identification and Word

Spotting

Contact Author

Markus Diem
Stefan Fiel
Florian Kleber
Robert Sablatnig
sab@caa.tuwien.ac.at

Copyright

CVL Database is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License

[1]

This database may be used for non-commercial research purpose only. If you publish material based

on this database, we request you to include a reference to the publication listed below.

Current Version

1.0

Keywords

Writer Identification, Word Spotting, Cursive Handwriting

Description

The CVL Database is a public database for writer retrieval, writer identification and word

spotting. The database consists of 7 different handwritten texts (1 German and 6 Englisch Texts)

and 309 different writers. For each text a rgb color image (300 dpi) comprising the handwritten

text and the printed text sample is available as well as a cropped version (only handwritten). An

unique id identifies the writer, whereas the Bounding Boxes for each single word are stored in an

XML file.

The CVL-database consists of images with cursively handwritten german and english texts which has

been choosen from literary works. All pages have an unique writer id and the text number

(separated by a dash) at the upper right corner, followed by the printed sample text. The text is

placed between two horizontal separatores. Beneath the printed text individuals have been asked

to write the text using a ruled undersheet to prevent curled text lines. The layout follows the

style of the database.

Samples of the following texts have been used:

  • Edwin A. Abbot - Flatland: A Romance of Many Dimension (92 words).
  • William Shakespeare - Mac Beth (49 words).
  • Wikipedia - Mailüfterl (73 words, under CC Attribution-ShareALike License).
  • Charles Darwin - Origin of Species (52 words).
  • Johann Wolfgang von Goethe - Faust. Eine Tragödie (50 words).
  • Oscar Wilde - The Picture of Dorian Gray (66 words).
  • Edgar Allan Poe - The Fall of the House of Usher (78 words).


Metadata and Technical Details

All pages have a unique writer id and the text number (separated by a dash) at the upper right

corner, followed by the printed sample text. The text is placed between two horizontal

separators. The files are named according the unique writer id and the text number. In addition,

text lines and words are extracted. Their filename convention is the same with the text line

number and word number respectively added at the end. For word images, the GT entry is the last

part of the filename. The Bounding Boxes for each single word are stored in an XML file according

the unique id.


Ground Truth Data

Related Tasks


References

Markus Diem, Stefan Fiel, Florian Kleber and Robert Sablatnig, CVL-Database: An Off-line Database

for Writer Retrieval, Writer Identification and Word Spotting, In Proc. of the 12th Int.

Conference on Document Analysis and Recognition (ICDAR) 2013, forthcoming.

Submitted Files

Version 1.0

Please refer to [http://caa.tuwien.ac.at/cvl/research/cvl-database/index.html

http://caa.tuwien.ac.at/cvl/research/cvl-database/index.html] for downloading the files from the

origninal datasets site.


This page is editable only by TC11 Officers .