Latest revision as of 13:24, 21 December 2018

Last updated: 2018-12-21

On-line handwriting

HTK - Hidden Markov Model Toolkit
Implementation of Bidirectional Long-Short Term Memory Networks (BLSTM) combined with Connectionist Temporal Classification (CTC) - including examples for Arabic recognition
SRILM - A Toolkit for generating language modeles
Torch5 - A Toolkit for HMM and GMM and many other machine learning algorithms
uptools: Tools for reading and processing files in the UNIPEN file format.
Comparison Tools for Handwriting Recognizers using the UNIPEN format (Gene Ratzlaff, IBM)

Off-line handwriting

HUE: a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK).

OCR

OCRopus - The OCRopus(tm) open source document analysis and OCR system
NHocr - OCR engine for Japanese language
Public domain OCR software (Univ. of Maryland, USA)
Source code at the DIMUND server (Univ. of Maryland, USA)
Optical Character Recognition sources
RWTH OCR - The RWTH Aachen University Optical Character Recognition System

Pixels vs Vectors

AutoTrace bitmap to vector conversion

Pattern classification

Support-Vector Machine: SVM^light Well-designed light-weight package for experimentation with the support-vector classifier. Several kernel functions are supported. ASCII data files.
SVM Torch-II is a new implementation of Vapnik's Support Vector Machine that works both for classification and regression problems, and that has been specifically tailored for large-scale problems (such as more than 20000 examples, even for input dimensions higher than 100).
Discrete-HMM kernel in C++ Originally developed for speech recognition, this generic package (ASCII data files!) allows for quick experimentation using discrete hidden-Markov modeling. A single HMM model is handled by the main program, thus multiple-class recognition will be realizable using (Unix) scripts.
AutoClass: An unsupervised Bayesian classification program (NASA). Some data modeling (e.g., specifying all feature scale types) and structuring of the (ASCII) files is required.
PCA:Principal Components Analysis, compact single main program written in C. Reads ASCII input files.

Information Retrieval

SMART 11.0: A package implementing the keyword vector-space approach for IR as introduced by Salton (1961). Source code is for SunOS, but has been ported to Linux by several groups. There is extensive documentation on www.

Tools for (linguistic) post processing

Word lists of a few Western languages.
Link Grammar 4.1: A parser for English, written in C, by Temperley, Sleator and Lafferty at Carnegie Mellon.
Ontolingua: Semantic modeling tool on WWW by Stanford University.
There is a European mirror site. Ontologies can be exported in a number of formats, including Kif, Clips, Loom and Prolog. This is a generic tool, but can be used for content-related or document-related modeling in the context of machine reading.

Benchmarking Tools

Algoval Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
PinkPanther document-segmentation benchmarking.

General

Learning and Optimization

This page is editable only by TC11 Officers .

@@ Line 1: / Line 1: @@
+{| style="width: 100%"
+|-
+| align="right" |
+{|
+|-
+| {{Last updated}}
+|}
+|}
 == On-line handwriting ==
+* [http://htk.eng.cam.ac.uk/ HTK - Hidden Markov Model Toolkit]
+* [http://github.com/meierue/RNNLIB Implementation of Bidirectional Long-Short Term Memory Networks (BLSTM) combined with Connectionist Temporal Classification (CTC) - including examples for Arabic recognition]
+* [http://www.speech.sri.com/projects/srilm/ SRILM - A Toolkit for generating language modeles]
+* [http://torch5.sourceforge.net/ Torch5 - A Toolkit for HMM and GMM and many other machine learning algorithms]
 * [http://unipen.nici.kun.nl/uptools3/ '''uptools:'''] Tools for reading and processing files in the UNIPEN file format.
 * [http://www.alphaworks.ibm.com/tech/comparehwr Comparison Tools for Handwriting Recognizers] using the UNIPEN format (Gene Ratzlaff, IBM)
@@ Line 7: / Line 23: @@
 * [http://esewww.essex.ac.uk/research/vasa/hueweb/huereal.html  '''HUE:'''] a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK).
+== OCR ==
- == OCR ==
+* [http://code.google.com/p/ocropus/ OCRopus - The OCRopus(tm) open source document analysis and OCR system]
+* [http://code.google.com/p/nhocr/ NHocr - OCR engine for Japanese language]
 * [http://documents.cfar.umd.edu/ocr/  Public domain OCR software] (Univ. of Maryland, USA)
 * [http://documents.cfar.umd.edu/resources/source/  Source code at the DIMUND server] (Univ. of Maryland, USA)
 * [http://www.fmi.uni-passau.de/~buckley/OCR.html  Optical Character Recognition sources]
+* [https://www-i6.informatik.rwth-aachen.de/rwth-ocr/ RWTH OCR - The RWTH Aachen University Optical Character Recognition System]
 == Pixels vs Vectors ==
@@ Line 38: / Line 55: @@
 == Benchmarking Tools ==
-* [algoval.html Algoval] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
+* [[Software - Algoval|Algoval]] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
 * [http://documents.cfar.umd.edu/resources/source/ppanther/ PinkPanther] document-segmentation benchmarking.
@@ Line 50: / Line 67: @@
 * [http://www.aic.nrl.navy.mil/~aha/research/machine-learning.html Machine Learning]
+<!--
 ----
 <font size="-1"> The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet. <br />[mailto:schomaker@ai.rug.nl  schomaker@ai.rug.nl ] </font>
+-->
+----
+This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].

Navigation menu

Difference between revisions of "Softwares"

Latest revision as of 13:24, 21 December 2018

Contents

On-line handwriting

Off-line handwriting

OCR

Pixels vs Vectors

Pattern classification

Information Retrieval

Tools for (linguistic) post processing

Benchmarking Tools

General

Learning and Optimization