ChemInfty - Chemical Structure GT
Datasets -> Current Page
|
Keywords
Chemical Structure Recognition, Diagram Recognition, Character Recognition
Description
At first, the images were recognized through our initial recognition engine, and the results were corrected manually. The result of this step is called the 'ChemInfty Graphical Structure GT' and it basically includes the positions of characters and lines.
Then using a separate software the 'ChemInfty Graphical Structure GT' was further processed to extract the chemical structure representation. The results of this step were also manually corrected and gave rise to the 'ChemInfty Chemical Structure GT'.
The file format used is the commonly used MDL SDF format, which is one of CTfile formats. The specification of the SDF format can be downloaded from here.
In addition to the complete dataset a more focused subset is supplied, selected to have only organic molecules with at least 5 heavy (non-hydrogen) atoms and molecular weight less than 1,000. This subset represents chemical structures which are potentially of interest to medicinal chemists and the pharmaceutical industry. Opening this subset was proposed and made by Igor Filippov. Many thanks go to Igor Filippov (http://cactus.nci.nih.gov/osra/).
Related Dataset
Related Tasks
None defined
Submitted Files
This page is editable only by TC11 Officers .