Softwares

From TC11
Revision as of 11:38, 28 August 2009 by Masa (talk | contribs)
Jump to: navigation, search

On-line handwriting

  • uptools: Tools for reading and processing files in the UNIPEN file format.
  • Comparison Tools for Handwriting Recognizers using the UNIPEN format (Gene Ratzlaff, IBM) == Off-line handwriting ==
  • HUE: a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK). == OCR ==
  • Public domain OCR software (Univ. of Maryland, USA)
  • Source code at the DIMUND server (Univ. of Maryland, USA)
  • Optical Character Recognition sources== Pixels vs Vectors ==
  • AutoTrace bitmap to vector conversion == Pattern classification ==
  • Support-Vector Machine: SVMlight Well-designed light-weight package for experimentation with the support-vector classifier. Several kernel functions are supported. ASCII data files.
  • SVM Torch-II is a new implementation of Vapnik's Support Vector Machine that works both for classification and regression problems, and that has been specifically tailored for large-scale problems (such as more than 20000 examples, even for input dimensions higher than 100).
  • Discrete-HMM kernel in C++ Originally developed for speech recognition, this generic package (ASCII data files!) allows for quick experimentation using discrete hidden-Markov modeling. A single HMM model is handled by the main program, thus multiple-class recognition will be realizable using (Unix) scripts.
  • AutoClass: An unsupervised Bayesian classification program (NASA). Some data modeling (e.g., specifying all feature scale types) and structuring of the (ASCII) files is required.
  • PCA:Principal Components Analysis, compact single main program written in C. Reads ASCII input files. == Information Retrieval ==
  • SMART 11.0: A package implementing the keyword vector-space approach for IR as introduced by Salton (1961). Source code is for SunOS, but has been ported to Linux by several groups. There is extensive documentation on www. == Tools for (linguistic) post processing ==
  • Word lists of a few Western languages.
  • Link Grammar 4.1: A parser for English, written in C, by Temperley, Sleator and Lafferty at Carnegie Mellon.
  • Ontolingua: Semantic modeling tool on WWW by Stanford University.
    There is a European mirror site. Ontologies can be exported in a number of formats, including Kif, Clips, Loom and Prolog. This is a generic tool, but can be used for content-related or document-related modeling in the context of machine reading.

Benchmarking Tools

  • [algoval.html Algoval] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
  • PinkPanther document-segmentation benchmarking.

General === Learning and Optimization =


The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet.
schomaker@ai.rug.nl