<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://iapr-tc11.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=DAS-Discussion%3AInformation_Extraction_%282014%29</id>
	<title>DAS-Discussion:Information Extraction (2014) - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://iapr-tc11.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=DAS-Discussion%3AInformation_Extraction_%282014%29"/>
	<link rel="alternate" type="text/html" href="http://iapr-tc11.org/mediawiki/index.php?title=DAS-Discussion:Information_Extraction_(2014)&amp;action=history"/>
	<updated>2026-04-21T17:17:09Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.31.16</generator>
	<entry>
		<id>http://iapr-tc11.org/mediawiki/index.php?title=DAS-Discussion:Information_Extraction_(2014)&amp;diff=2054&amp;oldid=prev</id>
		<title>Liwicki: /* DAS Working Subgroup Meeting: Information Extraction */</title>
		<link rel="alternate" type="text/html" href="http://iapr-tc11.org/mediawiki/index.php?title=DAS-Discussion:Information_Extraction_(2014)&amp;diff=2054&amp;oldid=prev"/>
		<updated>2015-01-02T13:54:26Z</updated>

		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;DAS Working Subgroup Meeting: Information Extraction&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 13:54, 2 January 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l20&quot; &gt;Line 20:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Xin TAO&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Xin TAO&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Ronaldo MESSINA (a2ia)&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Ronaldo MESSINA (a2ia)&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Nibal NAYEF (&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;me !&lt;/del&gt;)&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Nibal NAYEF (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;France&lt;/ins&gt;)&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Bao&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Bao&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2053:rev-2054 --&gt;
&lt;/table&gt;</summary>
		<author><name>Liwicki</name></author>
		
	</entry>
	<entry>
		<id>http://iapr-tc11.org/mediawiki/index.php?title=DAS-Discussion:Information_Extraction_(2014)&amp;diff=2053&amp;oldid=prev</id>
		<title>Liwicki: Created page with &quot;Back to DAS-Discussion:Index {| style=&quot;width: 100%&quot; |- | align=&quot;right&quot; |  {| |- | {{Last updated}} |}  |}  == DAS Working Subgroup Meeting: Information Extraction == Autho...&quot;</title>
		<link rel="alternate" type="text/html" href="http://iapr-tc11.org/mediawiki/index.php?title=DAS-Discussion:Information_Extraction_(2014)&amp;diff=2053&amp;oldid=prev"/>
		<updated>2015-01-02T13:54:02Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;Back to &lt;a href=&quot;/mediawiki/index.php/DAS-Discussion:Index&quot; title=&quot;DAS-Discussion:Index&quot;&gt;DAS-Discussion:Index&lt;/a&gt; {| style=&amp;quot;width: 100%&amp;quot; |- | align=&amp;quot;right&amp;quot; |  {| |- | {{Last updated}} |}  |}  == DAS Working Subgroup Meeting: Information Extraction == Autho...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Back to [[DAS-Discussion:Index]]&lt;br /&gt;
{| style=&amp;quot;width: 100%&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;right&amp;quot; |&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
|-&lt;br /&gt;
| {{Last updated}}&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== DAS Working Subgroup Meeting: Information Extraction ==&lt;br /&gt;
Authors:&lt;br /&gt;
* Nibal Nayef&lt;br /&gt;
Participants:&lt;br /&gt;
* Yoshinori AKAO (Japanese police)&lt;br /&gt;
* Saddok KEBAIRI (Itesoft)&lt;br /&gt;
* Manaba OHTA&lt;br /&gt;
* Xin TAO&lt;br /&gt;
* Ronaldo MESSINA (a2ia)&lt;br /&gt;
* Nibal NAYEF (me !)&lt;br /&gt;
* Bao&lt;br /&gt;
&lt;br /&gt;
=== Introduction ===&lt;br /&gt;
We have totally different views of information extraction&lt;br /&gt;
Different tasks:&lt;br /&gt;
* Entity spotting (numbers, words, ….)&lt;br /&gt;
* Graphics spotting (logos, symbols, tables etc.)&lt;br /&gt;
* Semantics after text recognition&lt;br /&gt;
* Logical structure&lt;br /&gt;
&lt;br /&gt;
=== What is a document ??!! ===&lt;br /&gt;
We have many types of documents [and increasing]:&lt;br /&gt;
* Digitally born documents&lt;br /&gt;
* Camera / mobile captured&lt;br /&gt;
* Scanned&lt;br /&gt;
..&lt;br /&gt;
&lt;br /&gt;
To extract any kind of information from any type of document, we need a sort of “prerequisite” module, so that IE modules can work on all document types&lt;br /&gt;
&lt;br /&gt;
=== Problems of IE ===&lt;br /&gt;
* What kind of semantic information should we extract?: Technical terms, ….&lt;br /&gt;
* Define the logical structure of a document&lt;br /&gt;
* Same information in different representations: Same name in different languages&lt;br /&gt;
* What are the ground truth data, size of training data?: Use human voting to build GT&lt;br /&gt;
* Ultimate goal: Automatic and complete understanding of document contents.&lt;br /&gt;
* Application: Enrich Data Mining&lt;br /&gt;
&lt;br /&gt;
=== Approaches ===&lt;br /&gt;
CRF, NLP, and all methods for word/graphic spotting&lt;br /&gt;
&lt;br /&gt;
=== Future Directions ===&lt;br /&gt;
Combine methods from different fields:&lt;br /&gt;
* Image processing&lt;br /&gt;
* Natural language processing&lt;br /&gt;
&lt;br /&gt;
Take into account that documents are drastically changing&lt;/div&gt;</summary>
		<author><name>Liwicki</name></author>
		
	</entry>
</feed>