Data Preparation: Introduction
Table of Contents

Sourcing and preparing data for modeling is one of the most difficult and tedious steps in machine learning. The process is iterative, and may require several runs to get the best results. The actual steps and techniques depend on the use case, the dataset and the chosen algorithms; as such domain expertise is key.

Proper data preparation drives much more value than clever algorithms.

Goal #

The goal in this phase of machine learning process is to transform data into a format that is suitable for machine learning algorithms and to facilitate the best performance of the resulting models.

Tasks #

  • Data blending: combine data from multiple sources to create a single dataset
  • Data cleaning: find and rectify mistakes or errors in the data.
  • Dimensionality reduction: reduce the number of features under consideration
  • Data transformation: to meet the requirements of certain tools/algorithms
  • Feature engineering: construct supplementary features from available data

Roles #

  • Domain Expert
  • Business User
  • Data Scientist
  • Data Engineer
  • Database Administrator
  • Join WhatsApp group here
  • Join Facebook group here
  • Follow on LinkedIn here

Powered by BetterDocs

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top