Data Understanding: Introduction
Table of Contents

Goal #

Initial collection of, and familiarization with, the data required to build the selected model

The phase overlaps with the Business Understanding and Data Preparation phases.

Tasks #

  • Collect initial data and build experimental datasets
  • Identify data quality issues, e.g., missing values, garbage records, duplicates, etc
  • Determine suitability of data for project in terms of number of records, distribution of features, quality of labels, etc
  • Discover first insights into the data
  • Find interesting subsets to form hypotheses regarding hidden information
  • Generate ideas about data preparation (the next phase of the project lifecycle)

These tasks require a good understanding of data and techniques in exploratory data analysis (EDA); these are discussed in the remainder of this section.

Roles #

  • Project Sponsor
  • Domain Expert
  • Business User
  • Data Engineer
  • Database Administrator
  • Data Scientist/Machine Learning Engineer
  • Data Developer/Software Engineer
  • Join WhatsApp group here
  • Join Facebook group here
  • Follow on LinkedIn here

Powered by BetterDocs

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top