Difference between revisions of "Chem-Infty Dataset: A ground-truthed dataset of Chemical Structure Images"

From TC11
Jump to: navigation, search
(Created page with 'Datasets -> Current Page {| style="width: 100%" |- | align="right" | {| |- | '''Created: '''2010-06-28 |- | {{Last updated}} |} |} =Contact Author= Koji Nakagawa(kn@ky…')
 
(References)
Line 65: Line 65:
  
 
=References=
 
=References=
# I. V. Filippov and M. C. Nicklaus. Extracting chemical structure information: Optical structure recognition application. In Pre-Proceedings of the 8th IAPR International Workshop on Graphics Recognition (GREC 2009), pages 133–142, 2009.
+
# Koji Nakagawa, Akio Fujiyoshi, and Masakazu Suzuki. Ground-Truthed Dataset of Chemical Structure Images in Japanese Published Patent Applications. In the proceedings of the 9th International Workshop on Document Analysis Systems (DAS'2010), pp 455-462, June 9-11, 2010, Boston, MA, USA.
# J. Park, G. R. Rosania, K. A. Shedden, M. Nguyen, N. Lyu, and K. Saitou. Automated extraction of chemical structure information from digital raster images. Chemistry Central Journal, 3(4), 2009.
+
# Akio Fujiyoshi, Koji Nakagawa, and Masakazu Suzuki. Robust Recognition Method of Chemical Structure Images for Japanese Published Patent Applications. Available as a short paper in the web page of the 9th International Workshop on Document Analysis Systems (DAS'2010), June 9-11, 2010, Boston, MA, USA.
# N. Sadawi. Recognising chemical formulas from molecule depictions. In Pre-Proceedings of the 8th IAPR International Workshop on Graphics Recognition (GREC 2009), pages 167–175, 2009.
+
# [http://www.symyx.com/solutions/white_papers/ctfile_formats.jsp CTfile Formats Specification]
# A. T. Valko and A. P. Johnson. CLiDE Pro: The latest generation of clide, a tool for optical chemical structure recognition. J. Chem. Inf. Model., 49(4):780–787, 2009.
 
 
 
  
 
=Submitted Files=
 
=Submitted Files=

Revision as of 18:34, 1 July 2010

Datasets -> Current Page

Created: 2010-06-28
Last updated: 2010-007-01

Contact Author

Koji Nakagawa(kn@kyudai.jp),
Faculty of Mathematics, 
Kyushu University, 
JAPAN
Akio Fujiyoshi(fujiyosi@mx.ibaraki.ac.jp), 
Department of Computer and Information Sciences, 
Ibaraki University, 
JAPAN
Masakazu Suzuki(suzuki@math.kyushu-u.ac.jp), 
Faculty of Mathematics, 
Kyushu University, 
JAPAN

License

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.1 Japan License

Current Version

1.0

Keywords

Optical Chemical Structure Recognition, Graphical Documents, Symbols

Description

ChemInfty Thumb.png

This dataset consists of chemical images (dataset) and their chemical meaning (see ground truth section). The 5727 chemical images were randomly collected from Japanese published patent applications in the year 2008.


  • Number of samples in the dataset: 869
  • File format: TIFF format images including binary and greyscale.
  • File Name Convention: The file names of image files and the meta data have the following name convention:
    • 2008XXXXXX_N_chem.tif: a TIFF file
    • 2008XXXXXX_N_chem.sdf: the meta data of 2008XXXXXX_NNN_chem.tif
    • The string '2008XXXXXX' expresses the patent ID and 'N' expresses the ‘N’-th elements of the multi-tiff file (See Reference \[1\]).

When you use or distribute this dataset, please inform the authors of your contact information (Name, Affiliation, E-mail address).

Disclaimer: Although the authors tried their best to provide an error-free dataset, there might be some incorrect data. If you encounter any such errors, please report them back to the authors so that the data can be updated.

Related Ground Truth Data

Related Tasks

None defined

References

  1. Koji Nakagawa, Akio Fujiyoshi, and Masakazu Suzuki. Ground-Truthed Dataset of Chemical Structure Images in Japanese Published Patent Applications. In the proceedings of the 9th International Workshop on Document Analysis Systems (DAS'2010), pp 455-462, June 9-11, 2010, Boston, MA, USA.
  2. Akio Fujiyoshi, Koji Nakagawa, and Masakazu Suzuki. Robust Recognition Method of Chemical Structure Images for Japanese Published Patent Applications. Available as a short paper in the web page of the 9th International Workshop on Document Analysis Systems (DAS'2010), June 9-11, 2010, Boston, MA, USA.
  3. CTfile Formats Specification

Submitted Files

Version 1.0


This page is editable only by TC11 Officers .