Data Integration and Provisioning of Storage Systems for Data Mining in Production-process Data

Data Integration and Provisioning of Storage Systems for Data Mining in Production-process Data

(Third Party Funds Group – Sub project)

Overall project: EFRE EIASY-Opt - Competence and Analysis Project for the "Data-driven Process and Production Optimization with the help of Data Mining and Big Data"
Project leader:
Project members: ,
Start date: 05/02/17
End date: 04/30/22
Acronym: E|ASY-Opt INF6
Funding source: Sonstige EU-Programme (z. B. RFCS, DG Health, IMI, Artemis), Bayerische Staatsministerien


Before data mining on production-process data can start, these data must be integrated first, because they come from very different and mostly heterogeneous sources.  The goal is to try out up-to-date methods and to develop them further, so that they match the specific operational environment of SMEs.  Also, the subsequent storage of these data is anything but trivial, since the volume can be very large ("Big Data").  One immediately thinks of the so-called NoSQL systems today, but they are just one of many options, and they offer only limited support w.r.t. selective access to the data.  Depending on data volume, data structures, and most importantly the required accesses, a suitable storage form must be determined, which also matches the specific requirements of SMEs, that is, not binding to many ressources and not asking for extensive administration.As has already been written in the main proposal, production-process data are today not yet integrated in significant portions, not to speak of evaluation and analysis.  It is however assumed that much information can be extracted and visualized, which could lead to a substantial improvement of the processes, for example, in the area of predictive maintenance.  For that, the methods of data mining could be used, as there are: prediction, association rules, classification, cluster analysis, and outlier detection.  They could be applied to these data.  To do that, however, two problem areas must be addressed:

  1. Heterogeneity of data:
    The different source systems have created their data in very diverse formats and structures, often as log files. We have some experience with (semantic) integration of heterogeneous data sets and also know the literature on that.  Data mining in almost all cases requires an integrated and cleaned data set as input.
  2. Provisioning of storage systems:
    In earlier days, one has always used a relational database for these tasks.  That has changed; the spectrum is much broader now.  It goes from Hadoop/HDFS over (very diverse) NoSQL systems, classical databases, and data warehouses all the way to data-stream-processing and complex-event-processing systems.  Which system fits best when and how to configura it depends on a large set of parameters: kind of data and accesses, data volume, requirements regarding consistency and fault tolerance, etc.  We are currently working on this in two Ph.D. projects.  The can benefit from the specific requirements of E|ASY-Opt and on the other hand provide their general knowledge on methods.

Potential working steps can be:

  • Comprehensive analysis of data generated in production processes
  • Cleaning of the data sets to be included in data mining (that is, handle missing values, inconsistencies, noise, duplicates, etc.)
  • Integration of data sets from different sources (if necessary, transformation of structures and values, definition and implementation of mapping functions)
  • Selection of a suitable storage form, using the criteria of data volume, data structures, and access patterns
  • Implementation of accesses for the preparation of data for data mining ("task-relevant data")
  • Incremental update of data sets when new data arrive from the production process.



Project Heads

Melanie Bianca Sigl, M. Sc.

  • Address:
    Martensstraße 3
    91058 Erlangen
  • Phone number: +49 9131 85-28683
  • Email:

Prof. Dr. Klaus Meyer-Wegener

  • Job title: Holder of Chair
  • Address:
    Martensstraße 3
    91058 Erlangen
  • Phone number: +49 9131 85-27892
  • Email:

Outer office

  • Cécile Astor
    • Phone number: +49 9131 85-27893
    • Fax number: +49 9131 85-28854
    • Email:
    • Office hours: Weekly Mon, 8:30 - 14:00, Room 08.139,
      Weekly Tue, Wed, Thu, 8:30 - 13:30, Room 08.139,

Further Participants

Sophie Russ