Difference between revisions of "DAS2010 Special Session on Contributed Datasets"
Line 1: | Line 1: | ||
= Overview – Message from TC-11 = | = Overview – Message from TC-11 = | ||
+ | |||
It is extremely important for the Document Image Analysis and Recognition community to be able to cross check and reproduce results described in published papers in the field. In order to achieve this, any datasets used as the basis for publications should be publicly available, as is the norm in many other disciplines. | It is extremely important for the Document Image Analysis and Recognition community to be able to cross check and reproduce results described in published papers in the field. In order to achieve this, any datasets used as the basis for publications should be publicly available, as is the norm in many other disciplines. | ||
Line 5: | Line 6: | ||
This initiative is not restricted to datasets. At TC-11 we are interested in archiving online any piece of data (ground-truth data, software, etc) which would allow to easily reproduce results, set new targets, foster healthy competition, encourage collaboration and generally advance the DIAR field as a whole. | This initiative is not restricted to datasets. At TC-11 we are interested in archiving online any piece of data (ground-truth data, software, etc) which would allow to easily reproduce results, set new targets, foster healthy competition, encourage collaboration and generally advance the DIAR field as a whole. | ||
+ | |||
+ | |||
= Submission Protocol for DAS 2010 Datasets = | = Submission Protocol for DAS 2010 Datasets = | ||
Line 15: | Line 18: | ||
The TC-11 is actively working towards a more comprehensive way of dealing with datasets and associated information. We will contact authors again in the future (when the framework is ready) to work with them towards introducing their datasets to the new more comprehensive framework. | The TC-11 is actively working towards a more comprehensive way of dealing with datasets and associated information. We will contact authors again in the future (when the framework is ready) to work with them towards introducing their datasets to the new more comprehensive framework. | ||
+ | |||
+ | |||
= Copyright Note = | = Copyright Note = | ||
Line 30: | Line 35: | ||
[http://creativecommons.org/choose/] | [http://creativecommons.org/choose/] | ||
. | . | ||
+ | |||
+ | |||
= Useful Definitions = | = Useful Definitions = | ||
Line 44: | Line 51: | ||
; Resources: Any other type of related resources that are not specifically covered by the above definitions. Examples would include software to browse and visualise a dataset, software to create ground truth data, algorithms to do performance evaluation, codecs, reports, publications, etc. | ; Resources: Any other type of related resources that are not specifically covered by the above definitions. Examples would include software to browse and visualise a dataset, software to create ground truth data, algorithms to do performance evaluation, codecs, reports, publications, etc. | ||
+ | |||
+ | |||
+ | |||
+ | = Submission Form = | ||
+ | |||
+ | The submission form has four sections. Please fill in each section as applicable in your situation. |
Revision as of 14:22, 24 March 2010
Contents
Overview – Message from TC-11
It is extremely important for the Document Image Analysis and Recognition community to be able to cross check and reproduce results described in published papers in the field. In order to achieve this, any datasets used as the basis for publications should be publicly available, as is the norm in many other disciplines.
The authors of DAS 2010 are actively encouraged to submit the datasets they used to train and / or evaluate their algorithms to the TC-11 in order for them to be published on the TC-11 Web site.
This initiative is not restricted to datasets. At TC-11 we are interested in archiving online any piece of data (ground-truth data, software, etc) which would allow to easily reproduce results, set new targets, foster healthy competition, encourage collaboration and generally advance the DIAR field as a whole.
Submission Protocol for DAS 2010 Datasets
The process of submitting a dataset to the TC-11 is the following:
- Fill in the form below, and send it by email to Dimosthenis Karatzas (dimos@cvc.uab.es), the TC-11 dataset curator.
- The TC-11 dataset curator will review the submission request, and ensure that all information is clear and complete, and any copyright issues are properly addressed.
- The TC-11 dataset curator will work with you to upload the dataset to the TC-11 Web site. Depending on the nature of the dataset this might be as easy as sending a CD or uploading the required files.
The TC-11 is actively working towards a more comprehensive way of dealing with datasets and associated information. We will contact authors again in the future (when the framework is ready) to work with them towards introducing their datasets to the new more comprehensive framework.
Copyright Note
TC-11 provides dataset hosting services as a benefit to the international research community. If it is determined that copyrighted material is improperly included in a dataset submitted to inclusion on the TC-11 website, we will immediately remove the offending material upon notification of the copyright holder.
By submitting a dataset for inclusion to the TC-11 Web site, the author certifies that he/she has the right to publish the dataset and any associated data in the public domain and the act of doing so does not violate intellectual property rights or copyrights of some third party.
The TC-11 will provide a service through which the submitted dataset and any associated data will be made public to the Document Analysis community worldwide. In case any legal dispute arises in the future in relation to the publishing of this dataset and associated data in the public domain, the author will hold TC-11 free from any wrongdoing and accept responsibility for the publication of these data.
By submitting a dataset and associated data to the TC-11, you explicitly accept that any third party can independently submit additional information that relates to the original dataset (e.g. additional ground-truth data, software, etc).
We strongly encourage the authors, where they own the copyrights of the submitted information, to consider offering it to the community under a creative commons license [1] [2] .
Useful Definitions
- Dataset
- A collection of data along with metadata information, as required to use these data.
- Ground Truth Specification
- The definition of the required information that accurately describes a particular aspect of the data at a high level where agreement between different observers can be established, as well as the definition of an appropriate structure (format) for storing this information. When speaking about a ground truth specification no actual ground truth data are implied to be present, nor is an explicit association assumed to a particular dataset.
- Ground Truth Data
- A set of data conforming to a particular ground truth specification and relating to a specific dataset.
- Metadata
- Metadata is information specific to a particular dataset. Metadata are usually tightly structured within the dataset itself (e.g. information coded within the filenames of submitted images). Metadata can only be submitted at the time of submission of the dataset.
- Task
- A well defined process to evaluate algorithms in the context of a specific scientific problem. A task would typically provide a specific evaluation protocol, and link to specific resources as required (a dataset, and usually related ground truth data).
- Resources
- Any other type of related resources that are not specifically covered by the above definitions. Examples would include software to browse and visualise a dataset, software to create ground truth data, algorithms to do performance evaluation, codecs, reports, publications, etc.
Submission Form
The submission form has four sections. Please fill in each section as applicable in your situation.