Chem-Infty Dataset: A ground-truthed dataset of Chemical Structure Images
Datasets -> Current Page
|
Contents
Contact Author
Koji Nakagawa(kn@kyudai.jp), Faculty of Mathematics, Kyushu University, JAPAN
Akio Fujiyoshi(fujiyosi@mx.ibaraki.ac.jp), Department of Computer and Information Sciences, Ibaraki University, JAPAN
Masakazu Suzuki(suzuki@math.kyushu-u.ac.jp), Faculty of Mathematics, Kyushu University, JAPAN
License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.1 Japan License
Current Version
1.0
Keywords
Optical Chemical Structure Recognition, Graphical Documents, Symbols
Description
This dataset consists of chemical images (dataset) and their chemical meaning (see ground truth section). The 5727 chemical images were randomly collected from Japanese published patent applications in the year 2008.
- Number of samples in the dataset: 869
- File format: TIFF format images including binary and greyscale.
- File Name Convention: The file names of image files and the meta data have the following name convention:
- 2008XXXXXX_N_chem.tif: a TIFF file
- 2008XXXXXX_N_chem.sdf: the meta data of 2008XXXXXX_NNN_chem.tif
- The string '2008XXXXXX' expresses the patent ID and 'N' expresses the ‘N’-th elements of the multi-tiff file (See Reference \[1\]).
When you use or distribute this dataset, please inform the authors of your contact information (Name, Affiliation, E-mail address).
Disclaimer: Although the authors tried their best to provide an error-free dataset, there might be some incorrect data. If you encounter any such errors, please report them back to the authors so that the data can be updated.
Related Ground Truth Data
Related Tasks
None defined
References
- I. V. Filippov and M. C. Nicklaus. Extracting chemical structure information: Optical structure recognition application. In Pre-Proceedings of the 8th IAPR International Workshop on Graphics Recognition (GREC 2009), pages 133–142, 2009.
- J. Park, G. R. Rosania, K. A. Shedden, M. Nguyen, N. Lyu, and K. Saitou. Automated extraction of chemical structure information from digital raster images. Chemistry Central Journal, 3(4), 2009.
- N. Sadawi. Recognising chemical formulas from molecule depictions. In Pre-Proceedings of the 8th IAPR International Workshop on Graphics Recognition (GREC 2009), pages 167–175, 2009.
- A. T. Valko and A. P. Johnson. CLiDE Pro: The latest generation of clide, a tool for optical chemical structure recognition. J. Chem. Inf. Model., 49(4):780–787, 2009.
Submitted Files
Version 1.0
- ChemInfty Dataset (69MB)
This page is editable only by TC11 Officers .