Difference between revisions of "Softwares"

From TC11
Jump to: navigation, search
Line 1: Line 1:
At the moment, please see [http://www.iapr-tc11.org/software.html].
+
== On-line handwriting ==
 +
* [http://unipen.nici.kun.nl/uptools3/ '''uptools:'''] Tools for reading and processing files in the UNIPEN file format.
 +
* [http://www.alphaworks.ibm.com/tech/comparehwr Comparison Tools for Handwriting Recognizers] using the UNIPEN format (Gene Ratzlaff, IBM) == Off-line handwriting ==
 +
* [http://esewww.essex.ac.uk/research/vasa/hueweb/huereal.html  '''HUE:'''] a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK). == OCR ==
 +
* [http://documents.cfar.umd.edu/ocr/  Public domain OCR software] (Univ. of Maryland, USA)
 +
* [http://documents.cfar.umd.edu/resources/source/  Source code at the DIMUND server] (Univ. of Maryland, USA)
 +
* [http://www.fmi.uni-passau.de/~buckley/OCR.html  Optical Character Recognition sources]== Pixels vs Vectors ==
 +
* [http://sourceforge.net/projects/autotrace AutoTrace] bitmap to vector conversion == Pattern classification ==
 +
* [http://www-ai.cs.uni-dortmund.de/SOFTWARE/SVM_LIGHT/svm_light.eng.html  Support-Vector Machine: SVM<sup>light</sup>] Well-designed light-weight package for experimentation with the support-vector classifier. Several kernel functions are supported. ASCII data files.
 +
* [http://www.idiap.ch/learning/SVMTorch.html  SVM Torch-II] is a new implementation of Vapnik's Support Vector Machine that works both for classification and regression problems, and that has been specifically tailored for large-scale problems (such as more than 20000 examples, even for input dimensions higher than 100).
 +
* [http://www.itl.atr.co.jp/comp.speech/Section6/Recognition/myers.hmm.html  Discrete-HMM kernel in C++] Originally developed for speech recognition, this generic package (ASCII data files!) allows for quick experimentation using discrete hidden-Markov modeling. A single HMM model is handled by the main program, thus multiple-class recognition will be realizable using (Unix) scripts.
 +
* [http://ic.arc.nasa.gov/ic/projects/bayes-group/autoclass/  '''AutoClass:'''] An unsupervised Bayesian classification program (NASA). Some data modeling (e.g., specifying all feature scale types) and structuring of the (ASCII) files is required.
 +
* [http://astro.u-strasbg.fr/~fmurtagh/mda-sw/pca.c  '''PCA:''']Principal Components Analysis, compact single main program written in C. Reads ASCII input files. == Information Retrieval ==
 +
* [ftp://ftp.cs.cornell.edu/pub/smart/ '''SMART 11.0:'''] A package implementing the keyword vector-space approach for IR as introduced by Salton (1961). Source code is for SunOS, but has been ported to Linux by several [http://www.linuxgazette.com/issue13/smart.html groups]. There is extensive [http://broncho.ct.monash.edu.au/~maria/Smart/hands-on-tekst.html  documentation] on www. == Tools for (linguistic) post processing ==
 +
* [http://unipen.nici.kun.nl/scrawls/dictionaries/udi/  '''Word lists'''] of a few Western languages.
 +
* [http://bobo.link.cs.cmu.edu/index.html/ftp-site/link-grammar/system-4.1/  '''Link Grammar 4.1:'''] A parser for English, written in C, by [http://bobo.link.cs.cmu.edu/index.html/  Temperley, Sleator and Lafferty ] at Carnegie Mellon.
 +
* [http://WWW-KSL-SVC.stanford.edu:5915/&service=frame-editor  '''Ontolingua:'''] Semantic modeling tool on WWW by Stanford University. <br /> There is a European [http://ontolingua.nici.kun.nl mirror site.] Ontologies can be exported in a number of formats, including Kif, Clips, Loom and Prolog. This is a generic tool, but can be used for content-related or document-related modeling in the context of machine reading.
 +
----
 +
== Benchmarking Tools ==
 +
* [algoval.html Algoval] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
 +
* [http://documents.cfar.umd.edu/resources/source/ppanther/ PinkPanther] document-segmentation benchmarking.
 +
----
 +
= General === Learning and Optimization ==
 +
* [http://www.emsl.pnl.gov:2080/proj/neuron/neural/systems/shareware.html Neural Networks]
 +
* [http://www.aic.nrl.navy.mil/galist/src/ Genetic Algorithms]
 +
* [http://www.aic.nrl.navy.mil/~aha/research/machine-learning.html Machine Learning]
 +
 
 +
----
 +
 
 +
<font size="-1"> The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet. <br />[mailto:schomaker@ai.rug.nl  schomaker@ai.rug.nl ] </font>

Revision as of 11:38, 28 August 2009

On-line handwriting

  • uptools: Tools for reading and processing files in the UNIPEN file format.
  • Comparison Tools for Handwriting Recognizers using the UNIPEN format (Gene Ratzlaff, IBM) == Off-line handwriting ==
  • HUE: a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK). == OCR ==
  • Public domain OCR software (Univ. of Maryland, USA)
  • Source code at the DIMUND server (Univ. of Maryland, USA)
  • Optical Character Recognition sources== Pixels vs Vectors ==
  • AutoTrace bitmap to vector conversion == Pattern classification ==
  • Support-Vector Machine: SVMlight Well-designed light-weight package for experimentation with the support-vector classifier. Several kernel functions are supported. ASCII data files.
  • SVM Torch-II is a new implementation of Vapnik's Support Vector Machine that works both for classification and regression problems, and that has been specifically tailored for large-scale problems (such as more than 20000 examples, even for input dimensions higher than 100).
  • Discrete-HMM kernel in C++ Originally developed for speech recognition, this generic package (ASCII data files!) allows for quick experimentation using discrete hidden-Markov modeling. A single HMM model is handled by the main program, thus multiple-class recognition will be realizable using (Unix) scripts.
  • AutoClass: An unsupervised Bayesian classification program (NASA). Some data modeling (e.g., specifying all feature scale types) and structuring of the (ASCII) files is required.
  • PCA:Principal Components Analysis, compact single main program written in C. Reads ASCII input files. == Information Retrieval ==
  • SMART 11.0: A package implementing the keyword vector-space approach for IR as introduced by Salton (1961). Source code is for SunOS, but has been ported to Linux by several groups. There is extensive documentation on www. == Tools for (linguistic) post processing ==
  • Word lists of a few Western languages.
  • Link Grammar 4.1: A parser for English, written in C, by Temperley, Sleator and Lafferty at Carnegie Mellon.
  • Ontolingua: Semantic modeling tool on WWW by Stanford University.
    There is a European mirror site. Ontologies can be exported in a number of formats, including Kif, Clips, Loom and Prolog. This is a generic tool, but can be used for content-related or document-related modeling in the context of machine reading.

Benchmarking Tools

  • [algoval.html Algoval] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
  • PinkPanther document-segmentation benchmarking.

General === Learning and Optimization =


The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet.
schomaker@ai.rug.nl