Difference between revisions of "ChemInfty - Chemical Structure GT"
(→Submitted Files) |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | [[Datasets]] -> Current Page | + | [[Datasets]] -> [[Datasets List]] -> Current Page |
{| style="width: 100%" | {| style="width: 100%" | ||
Line 23: | Line 23: | ||
The file format used is the commonly used MDL SDF format, which is one of CTfile formats. The specification of the SDF format can be downloaded from [http://www.symyx.com/solutions/white_papers/ctfile_formats.jsp here]. | The file format used is the commonly used MDL SDF format, which is one of CTfile formats. The specification of the SDF format can be downloaded from [http://www.symyx.com/solutions/white_papers/ctfile_formats.jsp here]. | ||
+ | |||
+ | In addition to the complete dataset ground truth, a more [http://www.iapr-tc11.org/dataset/ChemInfty/ChemInfty-SDFs-v100-focused-2010-06-30.zip focused subset] is supplied, selected to have only organic molecules with at least 5 heavy (non-hydrogen) atoms and molecular weight less than 1,000. This subset represents chemical structures which are potentially of interest to medicinal chemists and the pharmaceutical industry. Opening this subset was proposed and made by Igor Filippov. Many thanks go to Igor Filippov (http://cactus.nci.nih.gov/osra/). | ||
=Related Dataset= | =Related Dataset= |
Latest revision as of 18:02, 27 January 2011
Datasets -> Datasets List -> Current Page
|
Keywords
Chemical Structure Recognition, Diagram Recognition, Character Recognition
Description
At first, the images were recognized through our initial recognition engine, and the results were corrected manually. The result of this step is called the 'ChemInfty Graphical Structure GT' and it basically includes the positions of characters and lines.
Then using a separate software the 'ChemInfty Graphical Structure GT' was further processed to extract the chemical structure representation. The results of this step were also manually corrected and gave rise to the 'ChemInfty Chemical Structure GT'.
The file format used is the commonly used MDL SDF format, which is one of CTfile formats. The specification of the SDF format can be downloaded from here.
In addition to the complete dataset ground truth, a more focused subset is supplied, selected to have only organic molecules with at least 5 heavy (non-hydrogen) atoms and molecular weight less than 1,000. This subset represents chemical structures which are potentially of interest to medicinal chemists and the pharmaceutical industry. Opening this subset was proposed and made by Igor Filippov. Many thanks go to Igor Filippov (http://cactus.nci.nih.gov/osra/).
Related Dataset
Related Tasks
None defined
Submitted Files
This page is editable only by TC11 Officers .