A Bit of History

Heroic Data Mining is a term that Marcos M. Campos has been using for a while now. Over the years he has been working with Big Data Analytics problems and challenges with the following directives:

  • Process data sets with massive number of rows (e.g., 1Bs)
  • Handle very large number of attributes (e.g., 100Ks)
  • Overcome poor data quality (e.g., missing values, different scaling, constant values, unseen values during training)

Many people will identify the above requirements with Big Data Analytics. Although finding efficient solutions for these requirements is no small task, this is only part of the story. Due to the limited number of knowledgable advanced analytics professionals, it is important to address these requirement in an automated fashion. But, in the automated processing of massive data sets, failure is not an option. In these instances there are no humans in the loop to react and correct for failures. As a result, the system needs to be able to learn and score data without interruptions or undesirable restarts using a best effort approach.

This is a tall order, legendary even. It is a goal to aspire and strive for. It goes beyond Data Mining and Big Data Analytics. It is truly Heroic Data Mining.