Chem-Infty Dataset: A ground-truthed dataset of Chemical Structure Images
Datasets -> Datasets List -> Current Page
|
Contents
Contact Author
Koji Nakagawa(kn[at]kyudai.jp), Faculty of Mathematics, Kyushu University, JAPAN
Akio Fujiyoshi(fujiyosi[at]mx.ibaraki.ac.jp), Department of Computer and Information Sciences, Ibaraki University, JAPAN
Masakazu Suzuki(suzuki[at]math.kyushu-u.ac.jp), Faculty of Mathematics, Kyushu University, JAPAN
License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 2.1 Japan License
Current Version
1.0
Keywords
Optical Chemical Structure Recognition, Graphical Documents, Symbols
Description
This dataset consists of chemical images (dataset) and their chemical meaning (see ground truth section). The 5727 chemical images were randomly collected from Japanese published patent applications in the year 2008.
- Number of samples in the dataset: 869
- File format: TIFF format images including binary and greyscale.
- File Name Convention: The file names of image files and the meta data have the following name convention:
- 2008XXXXXX_N_chem.tif: a TIFF file
- 2008XXXXXX_N_chem.sdf: the meta data of 2008XXXXXX_NNN_chem.tif
- The string '2008XXXXXX' expresses the patent ID and 'N' expresses the ‘N’-th elements of the multi-tiff file (See Reference [1]).
When you use or distribute this dataset, please inform the authors of your contact information (Name, Affiliation, E-mail address).
Disclaimer: Although the authors tried their best to provide an error-free dataset, there might be some incorrect data. If you encounter any such errors, please report them back to the authors so that the data can be updated.
Related Datasets
- CLiDE (Chemical Literature Data Extraction) Validation Set.
- OSRA: Optical Structure Recognition. Validation data of US Patent.
Related Ground Truth Data
Related Tasks
None defined
References
- Koji Nakagawa, Akio Fujiyoshi, and Masakazu Suzuki. Ground-Truthed Dataset of Chemical Structure Images in Japanese Published Patent Applications. In the proceedings of the 9th International Workshop on Document Analysis Systems (DAS'2010), pp 455-462, June 9-11, 2010, Boston, MA, USA.
- Akio Fujiyoshi, Koji Nakagawa, and Masakazu Suzuki. Robust Recognition Method of Chemical Structure Images for Japanese Published Patent Applications. Available as a short paper in the web page of the 9th International Workshop on Document Analysis Systems (DAS'2010), June 9-11, 2010, Boston, MA, USA.
- CTfile Formats Specification
Submitted Files
Version 1.0
- ChemInfty Dataset (69MB)
This page is editable only by TC11 Officers .