Softwares
From TC11
On-line handwriting
- uptools: Tools for reading and processing files in the UNIPEN file format.
- Comparison Tools for Handwriting Recognizers using the UNIPEN format (Gene Ratzlaff, IBM) == Off-line handwriting ==
- HUE: a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK). == OCR ==
- Public domain OCR software (Univ. of Maryland, USA)
- Source code at the DIMUND server (Univ. of Maryland, USA)
- Optical Character Recognition sources== Pixels vs Vectors ==
- AutoTrace bitmap to vector conversion == Pattern classification ==
- Support-Vector Machine: SVMlight Well-designed light-weight package for experimentation with the support-vector classifier. Several kernel functions are supported. ASCII data files.
- SVM Torch-II is a new implementation of Vapnik's Support Vector Machine that works both for classification and regression problems, and that has been specifically tailored for large-scale problems (such as more than 20000 examples, even for input dimensions higher than 100).
- Discrete-HMM kernel in C++ Originally developed for speech recognition, this generic package (ASCII data files!) allows for quick experimentation using discrete hidden-Markov modeling. A single HMM model is handled by the main program, thus multiple-class recognition will be realizable using (Unix) scripts.
- AutoClass: An unsupervised Bayesian classification program (NASA). Some data modeling (e.g., specifying all feature scale types) and structuring of the (ASCII) files is required.
- PCA:Principal Components Analysis, compact single main program written in C. Reads ASCII input files. == Information Retrieval ==
- SMART 11.0: A package implementing the keyword vector-space approach for IR as introduced by Salton (1961). Source code is for SunOS, but has been ported to Linux by several groups. There is extensive documentation on www. == Tools for (linguistic) post processing ==
- Word lists of a few Western languages.
- Link Grammar 4.1: A parser for English, written in C, by Temperley, Sleator and Lafferty at Carnegie Mellon.
- Ontolingua: Semantic modeling tool on WWW by Stanford University.
There is a European mirror site. Ontologies can be exported in a number of formats, including Kif, Clips, Loom and Prolog. This is a generic tool, but can be used for content-related or document-related modeling in the context of machine reading.
Benchmarking Tools
- [algoval.html Algoval] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
- PinkPanther document-segmentation benchmarking.
General === Learning and Optimization =
The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet.
schomaker@ai.rug.nl