Difference between revisions of "Softwares"

From TC11
Jump to: navigation, search
m (Commented out a bit (historical leftover) that was pointing to Shomaker at the end of the page.)
 
(10 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
{| style="width: 100%"
 +
|-
 +
| align="right" |
 +
 +
{|
 +
|-
 +
| {{Last updated}}
 +
|}
 +
 +
|}
 +
 +
 
== On-line handwriting ==
 
== On-line handwriting ==
 +
* [http://htk.eng.cam.ac.uk/ HTK - Hidden Markov Model Toolkit]
 +
* [http://github.com/meierue/RNNLIB Implementation of Bidirectional Long-Short Term Memory Networks (BLSTM) combined with Connectionist Temporal Classification (CTC) - including examples for Arabic recognition]
 +
* [http://www.speech.sri.com/projects/srilm/ SRILM - A Toolkit for generating language modeles]
 +
* [http://torch5.sourceforge.net/ Torch5 - A Toolkit for HMM and GMM and many other machine learning algorithms]
 
* [http://unipen.nici.kun.nl/uptools3/ '''uptools:'''] Tools for reading and processing files in the UNIPEN file format.
 
* [http://unipen.nici.kun.nl/uptools3/ '''uptools:'''] Tools for reading and processing files in the UNIPEN file format.
 
* [http://www.alphaworks.ibm.com/tech/comparehwr Comparison Tools for Handwriting Recognizers] using the UNIPEN format (Gene Ratzlaff, IBM)
 
* [http://www.alphaworks.ibm.com/tech/comparehwr Comparison Tools for Handwriting Recognizers] using the UNIPEN format (Gene Ratzlaff, IBM)
Line 7: Line 23:
 
* [http://esewww.essex.ac.uk/research/vasa/hueweb/huereal.html  '''HUE:'''] a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK).
 
* [http://esewww.essex.ac.uk/research/vasa/hueweb/huereal.html  '''HUE:'''] a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK).
  
 
+
== OCR ==
== OCR ==
+
* [http://code.google.com/p/ocropus/ OCRopus - The OCRopus(tm) open source document analysis and OCR system]
 +
* [http://code.google.com/p/nhocr/ NHocr - OCR engine for Japanese language]
 
* [http://documents.cfar.umd.edu/ocr/  Public domain OCR software] (Univ. of Maryland, USA)
 
* [http://documents.cfar.umd.edu/ocr/  Public domain OCR software] (Univ. of Maryland, USA)
 
* [http://documents.cfar.umd.edu/resources/source/  Source code at the DIMUND server] (Univ. of Maryland, USA)
 
* [http://documents.cfar.umd.edu/resources/source/  Source code at the DIMUND server] (Univ. of Maryland, USA)
 
* [http://www.fmi.uni-passau.de/~buckley/OCR.html  Optical Character Recognition sources]
 
* [http://www.fmi.uni-passau.de/~buckley/OCR.html  Optical Character Recognition sources]
 
+
* [https://www-i6.informatik.rwth-aachen.de/rwth-ocr/ RWTH OCR - The RWTH Aachen University Optical Character Recognition System]
  
 
== Pixels vs Vectors ==
 
== Pixels vs Vectors ==
Line 38: Line 55:
  
 
== Benchmarking Tools ==
 
== Benchmarking Tools ==
* [algoval.html Algoval] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
+
* [[Software - Algoval|Algoval]] Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
 
* [http://documents.cfar.umd.edu/resources/source/ppanther/ PinkPanther] document-segmentation benchmarking.
 
* [http://documents.cfar.umd.edu/resources/source/ppanther/ PinkPanther] document-segmentation benchmarking.
  
Line 50: Line 67:
 
* [http://www.aic.nrl.navy.mil/~aha/research/machine-learning.html Machine Learning]
 
* [http://www.aic.nrl.navy.mil/~aha/research/machine-learning.html Machine Learning]
  
 +
<!--
 
----
 
----
  
 
<font size="-1"> The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet. <br />[mailto:schomaker@ai.rug.nl  schomaker@ai.rug.nl ] </font>
 
<font size="-1"> The software packages mentioned on this page are - mostly and preferably - available in source-code format (C,C++,Tcl/Tk,Java) and require standard ASCII input files. Please do not hesitate to give me a hint about free source code in the area of text processing on Internet. <br />[mailto:schomaker@ai.rug.nl  schomaker@ai.rug.nl ] </font>
 +
-->
 +
 +
----
 +
This page is editable only by [[IAPR-TC11:Reading_Systems#TC11_Officers|TC11 Officers ]].

Latest revision as of 13:24, 21 December 2018

Last updated: 2018-12-21


On-line handwriting


Off-line handwriting

  • HUE: a software toolkit which supports the rapid development and re-use of handwriting and document analysis systems (Univ. of Essex, UK).

OCR

Pixels vs Vectors


Pattern classification

  • Support-Vector Machine: SVMlight Well-designed light-weight package for experimentation with the support-vector classifier. Several kernel functions are supported. ASCII data files.
  • SVM Torch-II is a new implementation of Vapnik's Support Vector Machine that works both for classification and regression problems, and that has been specifically tailored for large-scale problems (such as more than 20000 examples, even for input dimensions higher than 100).
  • Discrete-HMM kernel in C++ Originally developed for speech recognition, this generic package (ASCII data files!) allows for quick experimentation using discrete hidden-Markov modeling. A single HMM model is handled by the main program, thus multiple-class recognition will be realizable using (Unix) scripts.
  • AutoClass: An unsupervised Bayesian classification program (NASA). Some data modeling (e.g., specifying all feature scale types) and structuring of the (ASCII) files is required.
  • PCA:Principal Components Analysis, compact single main program written in C. Reads ASCII input files.


Information Retrieval

  • SMART 11.0: A package implementing the keyword vector-space approach for IR as introduced by Salton (1961). Source code is for SunOS, but has been ported to Linux by several groups. There is extensive documentation on www.


Tools for (linguistic) post processing

  • Word lists of a few Western languages.
  • Link Grammar 4.1: A parser for English, written in C, by Temperley, Sleator and Lafferty at Carnegie Mellon.
  • Ontolingua: Semantic modeling tool on WWW by Stanford University.
    There is a European mirror site. Ontologies can be exported in a number of formats, including Kif, Clips, Loom and Prolog. This is a generic tool, but can be used for content-related or document-related modeling in the context of machine reading.

Benchmarking Tools

  • Algoval Internet-based algorithm evaluation. Several benchmarks in the area of TC-11 are already present (digit recognition, dictionary search, region-of-interest (ROI) detection). Algorithms in Java can be uploaded and compared (Simon Lucas, Univ. of Essex).
  • PinkPanther document-segmentation benchmarking.

General

Learning and Optimization



This page is editable only by TC11 Officers .