Data Preparation: Introduction

Table of Contents

Goal
Tasks
Roles

Sourcing and preparing data for modeling is one of the most difficult and tedious steps in machine learning. The process is iterative, and may require several runs to get the best results. The actual steps and techniques depend on the use case, the dataset and the chosen algorithms; as such domain expertise is key.

Proper data preparation drives much more value than clever algorithms.

Goal #

The goal in this phase of machine learning process is to transform data into a format that is suitable for machine learning algorithms and to facilitate the best performance of the resulting models.

Tasks #

Data blending: combine data from multiple sources to create a single dataset
Data cleaning: find and rectify mistakes or errors in the data.
Dimensionality reduction: reduce the number of features under consideration
Data transformation: to meet the requirements of certain tools/algorithms
Feature engineering: construct supplementary features from available data

Roles #

Domain Expert
Business User
Data Scientist
Data Engineer
Database Administrator

Join WhatsApp group here
Join Facebook group here
Follow on LinkedIn here

What are your Feelings

Still stuck? How can we help?

Updated on 6 July, 2022

1. Introduction to Artificial Intelligence and Machine Learning

2. Machine Learning Success Factors

3. Build and Use a Quick Machine Learning Model

4. Defining the Business Problem

5. Data Understanding

6. Data Preparation

7. Modeling

8. Predictive Modeling

9. Model Validation

10. Model Deployment

Data Preparation: Introduction

Goal #

Tasks #

Roles #

What are your Feelings

Leave a Reply Cancel reply

Goal #

Tasks #

Roles #

What are your Feelings

Share post:

What's on your mind?

Leave a Reply Cancel reply