DAS-Discussion: Systems that improve with use (2016)

From TC11
Jump to: navigation, search
(Report by Ido)
 
(Report by Ido)
 
Line 60: Line 60:
 
What about online evaluation - evaluating after benchmark end delivery? Building an self-online adversary system that evaluates the main system without ground truth, after delivery time. Sometimes the evaluation can be measured explicitly/implicitly by users activity. The feedback on the data should not necessarily come from the user but the processing chain.  
 
What about online evaluation - evaluating after benchmark end delivery? Building an self-online adversary system that evaluates the main system without ground truth, after delivery time. Sometimes the evaluation can be measured explicitly/implicitly by users activity. The feedback on the data should not necessarily come from the user but the processing chain.  
  
Business problem: if i build a system that adapts to all changes there will be no incentive for the next project. People are paying only once, the economics are against the system. Then you move to subscription models, not purchase.  
+
Business problem: if I build a system that adapts to all changes there will be no incentive for the next project. People are paying only once, the economics are against the system. Then you move to subscription models, not purchase.  
  
 
Big data: from the day of delivery it is not adapted to your problem. No real “ground truth”.
 
Big data: from the day of delivery it is not adapted to your problem. No real “ground truth”.

Latest revision as of 09:14, 16 May 2016

Back to DAS-Discussion:Index

Last updated: 2016-005-16

DAS Working Subgroup Meeting: Systems that improve with use

Authors:

  • Ido Kissos, Tel Aviv Israel - Improving printed Arabic OCR with ML methods.

Participants:

  • Marc-Peter Schambach, Siemens, Germany: Handwriting recognition, address recognition.
  • Abdel Belaid, University of Lorraine, France. Administrative docs, info extraction, table detection, entity recognition, OCR and evaluation.
  • Nicolas Ragaut, University of Tours, France. Medical image analysis, doc image analysis. Handwriting and OCR.
  • Brian Davis, Utah, USA. Computer assited transcription (geneological docs).
  • Anh Le, TUAT, Japan - Handwiritng recognition.

Problem Definition

  • Data centric systems break with usage environment change
  • Goal: Handling changes over time in data centric systems
    • Data changes
    • User need or preferences
    • Training data is not representative
    • Problem domain adaptation
  • Method: Exploit usage-data to improve systems
    • Explicit: explicit data labeling in workflow, dedicated configuration modules
    • Implicit: behavior change, continuous evaluation, negative feedbacks

Challenges

  1. Algorithmic – online training with the new data
  2. Architecture – how to close the loop
  3. Labeling and data correctness – interpret all kinds of user behavior
  4. Privacy – getting feedback is non-trivial
  5. Psychologic – we do not want to look at systems that get worse with new data
  6. Economic – it may have no business case

Solutions - Things to keep in mind

  • Prior analysis of possible drifts
  • Implication of drifts: know the robustness of your model to new data, the gains of more training data
  • “One Click” training
  • Online evaluation
    • Measure performance over time
    • Update ground truth
    • Boosting
    • Independent evaluation model - having an adversary classifier to online evaluate performances?

Full Version

Until recently data changes would occur in intervals longer than the system’s life-span, as well as they did not mainly rely on vast amount and diverse data. Possible data or user need drifts would be handled by new versions of a software, and that would also be the software companies business model - sell new versions of the same product. Nowadays many software become more data-centric, relying on classifier modules as their core ability. In the “data-era” these changes happen fast, features for classification are sometimes hidden not, hence users do not realize their data drifts until the systems looses its liability. Therefore - the systems should adapt!

Why don’t we develop such systems - because our labs don’t update with real data. Drifts happen within large times. You’re not looking for it - it is expensive, time consuming. Maybe there are psychological issues with this, we don’t want to see this going worst. In academy we are publication-oriented, not production-oriented.

Do we need architecturally to support automatic retrain? If systems get better by adding new data, we must support it. Maybe it is difficult to a full automatic system, just semi automatic. Users will not be willing to give a full new labeling as part of their daily use, but one can expect to have partial or implicit labeling. Systems have to learn from its errors so it will be able to correct it in the future, and modelling errors can be a difficult task. A system has to be self confident about its claim - a one class classifier may do it better to predict its false positive errors.

What about online evaluation - evaluating after benchmark end delivery? Building an self-online adversary system that evaluates the main system without ground truth, after delivery time. Sometimes the evaluation can be measured explicitly/implicitly by users activity. The feedback on the data should not necessarily come from the user but the processing chain.

Business problem: if I build a system that adapts to all changes there will be no incentive for the next project. People are paying only once, the economics are against the system. Then you move to subscription models, not purchase.

Big data: from the day of delivery it is not adapted to your problem. No real “ground truth”.

Privacy: we are not allowed to get back the data from the user. governments are strict about it, and it is getting worst. On the other hand big corporations have all data, or a big enough representation of it. Trying to anonymize methods data to allow research on top of them, or ease the regulation to make the data market more fair.

Robustness of the training: what will be the impact of the improvement in the future. Try to make up a score to tell how much more training should we need for how much improvement. What are the gains. Maybe they should be standardized.